As contributors and maintainers of this project, we pledge to respect all people who contribute through reporting issues, posting feature requests, updating documentation, submitting pull requests or patches, and other activities.
We are committed to making participation in this project a harassment-free experience for everyone, regardless of level of experience, gender, gender identity and expression, sexual orientation, disability, personal appearance, body size, race, ethnicity, age, or religion.
Examples of unacceptable behavior by participants include the use of sexual language or imagery, derogatory comments or personal attacks, trolling, public or private harassment, insults, or other unprofessional conduct.
Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct. Project maintainers who do not follow the Code of Conduct may be removed from the project team.
Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by opening an issue or contacting one or more of the project maintainers.
This Code of Conduct is adapted from the Contributor Covenant (http:contributor-covenant.org), version 1.0.0, available at http://contributor-covenant.org/version/1/0/0/
MIT License Copyright (c) 2018 Joshua D Campbell Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
vignettes/articles/celda_pbmc3k.Rmd
celda_pbmc3k.RmdCelda is a Bayesian hierarchical model that can perform bi-clustering of features into modules and observations into subpopulations. In this tutorial, we will apply Celda to a real-world single-cell RNA sequencing (scRNA-seq) dataset of 2,700 Peripheral Blood Mononuclear Cells (PBMCs) collected from a healthy donor. This dataset (PBMC3K) is available from 10X Genomics and can be found on the 10X website.
The celda package uses the SingleCellExperiment (SCE) object for management of expression matrices, feature/cell annotation data, and metadata. All of the functions have an SCE object as the first input parameter. The functions operate on a matrix stored in the assay slot of the SCE object. The parameter useAssay can be used to specify which matrix to use (the default is "counts"). Matrices can be of class matrix or dgCMatrix from the Matrix package. While the primary clustering is performed with functions from the celda package, the singleCellTK package is used for some other tasks such as importing data, quality control, and marker identification with differential expression.
The PBMC3K data can be easily loaded via the Bioconductor package TENxPBMCData. TENxPBMCData is an experiment package that provides resources for various PBMC datasets generated by 10X Genomics. When using this package, the column names of returned SCE object are NULL by default. For this example, we paste together the name of the sample with the cell barcode to generate column names for the SCE object. Additionally, the count matrix within sce object is converted from a DelayedMatrix object to a sparse matrix dgCMatrix object.
library(TENxPBMCData)
sce <- TENxPBMCData("pbmc3k")
colnames(sce) <- paste0("pbmc3k_", colData(sce)$Sequence)
counts(sce) <- as(counts(sce), "dgCMatrix")If you have the singleCellTK package installed, then this dataset can be imported and converted with a single command:
library(singleCellTK)
sce <- importExampleData("pbmc3k")To get your own data into a SingleCellExperiment object, the singleCellTK package has several importing functions for different preprocessing tools including CellRanger, STARsolo, BUStools, Optimus, DropEST, SEQC, and Alevin/Salmon. For example, the following code can be used as a template to read in multiple samples processed with CellRanger:
library(singleCellTK)
sce <- importCellRanger(sampleDirs = c("path/to/sample1/", "path/to/sample2/"))Note: As a reminder, you can view the assays, column annotation, and row annotation stored in the SCE with the commands assays(sce), colData(sce), and rowData(sce), respectively.
Finally, we set the rownames of the SCE to the gene symbol:
rownames(sce) <- rowData(sce)$Symbol_TENxQuality control and filtering of cells is often needed before down-stream analyses such as dimensionality reduction and clustering. Typical filtering procedures include exclusion of poor quality cells with low numbers of counts/UMIs, estimation and removal of ambient RNA, and identification of potential doublet/multiplets. Many tools and packages are available to perform these operations and users are free to apply their tool(s) of choice as the celda clustering functions will work with any matrix stored in an SCE object. The celda package does contain a Bayesian method called decontX to estimate and remove transcript contamination in individual cells in a scRNA-seq dataset.
To perform QC, we suggest using the runCellQC function in singleCellTK package. This is a wrapper for several methods for calculation of QC metrics, doublet detection, and estimation of ambient RNA (including decontX). Below is a quick example of how to perform standard QC before applying celda. If you have another preferred approach or your data has already been QC’ed, you can move to Feature selection section. For this tutorial, we will only run one doublet detection algorithm and one decontamination algorithms. For a full list of algorithms that this function runs by default, see ?runCellQC. We will also quantify the percentage of mitochondrial genes in each cell as this is often used as a measure of cell viability.
library(singleCellTK)
# Get list of mitochondrial genes
mito.genes <- grep("^MT-", rownames(sce), value = TRUE)
# Run QC
sce <- runCellQC(sce, sample = NULL, algorithms = c("QCMetrics", "scDblFinder", "decontX"), geneSetList = list(mito=mito.genes), geneSetListLocation = "rownames")Note: If you have cells from multiple samples stored in the SCE object, make sure to supply the sample parameter as the QC tools need to be applied to cells from each sample individually.
Individual sets of QC metrics can be plotted with specific functions. For example to plot distributions of total numbers of UMIs derived from runPerCellQC, doublet scores from runScDblFinder, and contamination scores from runDecontX (all of which were run by the runCellQC function), the following plotting functions can be used:

plotScDblFinderResults(sce, reducedDimName = "decontX_UMAP")
plotDecontXResults(sce, reducedDimName = "decontX_UMAP")
An comprehensive HTML report can be generated to visualize and explore the QC metrics in greater detail:
reportCellQC(sce)After examining the distributions of various QC metrics, poor quality cells will need to be removed. Typically, thresholds for QC metrics should exclude cells that are outliers of the distribution (i.e. long tails in the violin or density plots). Cells can be removed using the subsetSCECols function. Metrics stored in the colData of the SCE object can be filtered using the colData parameter. Here we will limit to cells with at least 600 counts and 300 genes detected:
# Filter SCE
sce <- subsetSCECols(sce, colData = c("total > 600", "detected > 300"))
# See number of cells after filtering
ncol(sce)## [1] 2675
Other common metrics to filter on include subsets_mito_percent for removal of cells with high mitochondrial percentage, decontX_contamination for removal of cells with higher levels of contamination from ambient RNA, scDblFinder_class to remove doublets (or calls from any of the other doublet detection algorithms). See the singleCellTK documentation For more information on performing comprehensive QC and filtering.
In general, removing features with low numbers of counts across all cells is recommended to reduce computational run time. A simple selection can be performed by removing features with a minimum number of counts in a minimum number of cells using the selectFeatures function:
# Select features with at least 3 counts in at least 3 cells
library(celda)
useAssay <- "counts"
altExpName <- "featureSubset"
sce <- selectFeatures(sce, minCount = 3, minCell = 3, useAssay = useAssay, altExpName = altExpName)
# See number of features after filtering
nrow(altExp(sce, altExpName))## [1] 2639
The useAssay parameter is used to denote which assay/matrix within the SCE to use for filtering. The default raw counts matrix is traditionally stored in the "counts" assay. If decontX was previously run during QC, then the decontaminated counts can be used by setting this parameter to "decontXcounts". We will save this parameter in a variable called useAssay which will be used as input in several downstream functions.
Note: The subsetted matrix is stored in the “alternative experiment” slot (altExp) within the SCE. This allows for a matrix with a different number of rows to be stored within the same SCE object (rather than creating two SCE objects). The celda functions described in the next several sections operate on a matrix stored in the altExp slot. The default name given to the alternative experiment and used in all downstream celda functions is "featureSubset". If the altExpName parameter is changed here, then it will need to be supplied to downstream plotting functions as well. The list of alternative experiments in an SCE can be view with altExpNames(sce). If you have already have an SCE with selected features or do not want to perform feature selection, then you need to set the alternative experiment directly with a command like altExp(sce, "featureSubset") <- assay(sce, "counts"). In the future, this will be updated to be more simple by utilizing the ExperimentSubset package.
If the number of features is still relatively large (e.g. >5000), an alternative approach is to select highly variable features that can be used in the downstream clustering. The advantage of this approach is that it can greatly speed up celda and can improve with module detection among highly variable features with overall lower expression. The disadvantage of this approach is that features that do not fall into the highly variable group will not be clustered into modules. The celda package does not include methods for selection of highly variable genes (HVGs). However, the singleCellTK provides wrappers for methods used in Seurat and Scran. We recommend keeping at least 2,000-5,000 HVGs for clustering. Here is some example code of how to select the top 5,000 most variable genes and store it back in the SCE as an altExp:
library(singleCellTK)
sce <- seuratFindHVG(sce, useAssay = useAssay, hvgMethod = "vst")
g <- getTopHVG(sce, method = "vst", n = 5000)
altExp(sce, altExpName) <- sce[g, ]For the rest of the analysis with the PBMC3K data, we will use the first approach where features with at least 3 counts in 3 cells were included.
As mentioned earlier, celda is discrete Bayesian model that is able to simultaneously bi-cluster features into modules and cells into cell clusters. The primary bi-clustering model can be accessed with the function celda_CG. This function operates on a matrix stored as an alternative experiment in the altExp slot. If you did not perform feature selection as recommended in the previous section and your matrix of interest is not currently located in an altExp slot, the following code can be used to copy a matrix in the main assay slot to the altExp slot:
useAssay <- "counts"
altExpName <- "featureSubset"
altExp(sce, altExpName) <- assay(sce, useAssay)`. The two major adjustable parameters in this model are L, the number of modules, and K, the number of cell populations. The following code bi-clusters the PBMC3K dataset into 100 modules and 15 cell populations:
sce <- celda_CG(sce, L = 100, K = 15, useAssay = useAssay, altExpName = altExpName)However, in most cases, the number of feature modules (L) and the number of cell clusters (K) are not known beforehand. In the next sections, we outline procedures that can be used suggest reasonable choices for these parameters. If the data is clustered with the code above by supplying K and L directly to the celda_CG function, then you can skip the next section and proceed to Creating 2-D embeddings.
In order to help choose a reasonable solutions for L and K, celda provides step-wise splitting procedures along with measurements of perplexity to suggest reasonable choices for L and K. First, the function recursiveSplitModule can be used to cluster features into modules for a range of L. Within each step, the best split of an existing module into 2 new modules is chosen to create the L-th module. The module labels of the previous model with \(L-1\) modules are used as the initial starting values in the next model with \(L\) modules. Note that the initialization step may take longer with larger numbers of cells in the dataset and the splitting procedure will take longer with larger numbers features in the dataset. Celda models with a L range between initialL = 10 and maxL = 150 are tested in the example below.
moduleSplit <- recursiveSplitModule(sce, useAssay = useAssay, altExpName = altExpName, initialL = 10, maxL = 150)Perplexity has been commonly used in the topic models to measure how well a probabilistic model predicts observed samples (Blei et al., 2003). Here, we use perplexity to evaluate the performance of individual models by calculating the probability of observing expression counts given an estimated Celda model. Rather than performing cross-validation which is computationally expensive, a series of test sets are created by sampling the counts from each cell according to a multinomial distribution defined by dividing the counts for each gene in the cell by the total number of counts for that cell. Perplexity is then calculated on each test set and can be visualized using function plotGridSearchPerplexity. A lower perplexity indicates a better model fit.
plotGridSearchPerplexity(moduleSplit, altExpName = altExpName, sep = 10)
The perplexity alone often does not show a clear elbow or “leveling off”. However, the rate of perplexity change (RPC) can be more informative to determine when adding new modules does not add much additional information Zhao et al., 2015). An RPC closer to zero indicates that the addition of new modules or cell clusters is not substantially decreasing the perplexity. The RPC of models can be visualized using function plotRPC:
plotRPC(moduleSplit, altExpName = altExpName)
In this case, we will choose an L of 80 as the RPC curve tends to level off at this point:
L <- 80L. However, they may not always give a clear “leveling off” depending of the complexity and quality of the dataset. Do not give up if the choice of L is unclear or imperfect! If the L to choose is unclear from these, then you can set a somewhat high number (e.g. 75) and move to the next step of selecting K. Later on, manual review of modules using functions such as moduleHeatmap can give a sense of whether individual modules should be further split up by selecting higher L. For example, you can start exploring the cell populations and modules with L = 75. If some modules need to be further split, you can then try L = 100, L = 125, and so on.Now we extract the Celda model of L =\(L\) with function subsetCeldaList and run recursiveSplitCell to fit models with a range of K between 3 and 25:
temp <- subsetCeldaList(moduleSplit, list(L = L))
sce <- recursiveSplitCell(sce, useAssay = useAssay, altExpName = altExpName, initialK = 3, maxK = 25, yInit = celdaModules(temp))The perplexities and RPC of models can be visualized using the same functions plotGridSearchPerplexity and plotRPC.

plotRPC(sce)
The perplexity continues to decrease with larger values of K. The RPC generally levels off between 13 and 16 and we choose the model with K = 14 for downstream analysis. The follow code selects the final celda_CG model with L = 80 and K = 14:
K <- 14
sce <- subsetCeldaList(sce, list(L = L, K = K))Note: Similar to choosing L, you can guess an initial value of K based off of the perplexity and RPC plots and then move to the downstream exploratory analyses described in the next several sections. After reviewing the cell clusters on 2-D embeddings and module heatmaps, you may have to come back to tweak the choice of K until you have something that captures the cellular heterogeneity within the data without “over-clustering” cells into too many subpopulations. This may be an iterative procedure of going back-and-forth between choices of K and plotting the results. So do not let imperfect perplexity/PRC plots prevent you from moving on to the rest of the analysis. Often times, using an initial guess for K will allow you to move on in the analysis to get a sense of the major sources of biological heterogeneity present in the data.
After selecting a celda model with specific values of L and K, we can then perform additional exploratory and downstream analyses to understand the biology of the transcriptional modules and cell populations. We can start by generating a dimension reduction plot with the Uniform Manifold Approximation and Projection (UMAP) method to visualize the relationships between the cells in a 2-D embedding. This can be done with function celdaUmap.
sce <- celdaUmap(sce, useAssay = useAssay, altExpName = altExpName)Alternatively, a t-distributed stochastic neighbor embedding (t-SNE) can be generated using function celdaTsne. The UMAP and t-SNE plots generated by celdaUmap and celdaTsne are computed based on the module probabilities (analogous to using PCs from PCA). The calculated dimension reduction coordinates for the cells are stored under the reducedDim slot of the altExp slot in the original SCE object. The follow command lists the names of the dimensionality reductions that can be used in downstream plotting functions in the next few sections:
reducedDimNames(altExp(sce, altExpName))## [1] "decontX_UMAP" "celda_UMAP"
The function plotDimReduceCluster can be used to plot the cluster labels for cell populations identified by celda on the UMAP:
plotDimReduceCluster(sce, reducedDimName = "celda_UMAP", labelClusters = TRUE)
Usually, biological features of some cell populations are known a priori and can be identified with known marker genes. The expression of selected marker genes can be plotted on the UMAP with the function plotDimReduceFeature.
markers <- c("CD3D", "IL7R", "CD4", "CD8B", "CD19", "FCGR3A", "CD14", "FCER1A", "PF4")
plotDimReduceFeature(x = sce, features = markers, reducedDimName = "celda_UMAP", useAssay = useAssay, altExpName = altExpName, normalize = TRUE)
The parameter displayName can be used to switch between IDs stored in the rownames of the SCE and columns of the rowData of the SCE. If the assay denoted by useAssay is a raw counts matrix, then setting normalize = TRUE is recommended (otherwise the z-score of the raw counts will be plotted). When set to TRUE, each count will be normalized by dividing by the total number of counts in each cell. An alternative approach is to perform normalization with another method and then point to the normalized assay with the useAssay parameter. For example, normalization can be performed with the scater package:
library(scater)
sce <- logNormCounts(sce, exprs_values = useAssay, name = "logcounts")
plotDimReduceFeature(x = sce, features = markers, reducedDimName = "celda_UMAP", useAssay = "logcounts", altExpName = altExpName, normalize = FALSE)This second approach may be faster if plotting a lot of marker genes or if the dataset is relatively large.
Once we identify of various cell subpopulations using the known marker genes, these custom labels can be added on the UMAP colored by cluster:
g <- plotDimReduceCluster(sce, reducedDimName = "celda_UMAP", altExpName = altExpName, labelClusters = TRUE)
labels <- c("1: Megakaryocytes",
"2: CD14+ Monocytes 1",
"3: CD14+ Monocytes 2",
"4: FCGR3A (CD16+) Monocytes",
"5: CD14+ Monocytes 3",
"6: CD8+ Cytotoxic T-cells",
"7: CD4+ T-cells",
"8: CD8+ Cytotoxic T-cells",
"9: B-cells",
"10: Naive CD8+ T-cells",
"11: Naive CD4+ T-cells",
"12: NK-cells",
"13: Unknown T-cells",
"14: Dendritic cells")
library(ggplot2)
g <- g + scale_color_manual(labels = labels,
values = distinctColors(length(labels)))
print(g)
Celda has the ability to identify modules of co-expressed features and quantify the probability of these modules in each cell population. An overview of the relationships between modules and cell subpopulations can be explored with the function celdaProbabilityMap. The “Absolute probability” heatmap on the left shows the proportion of counts in each module for each cell population. The “Absolute probability” map gives insights into the absolute abundance of a module within a given cell subpopulation. The absolute heatmap can be used to explore which modules are higher than other modules within a cell population. The “Relative expression” map shows the standardized (z-scored) module probabilities across cell subpopulations. The relative heatmap can be used to explore which modules are relatively higher than other modules across cell populations.
celdaProbabilityMap(sce, useAssay = useAssay, altExpName = altExpName)
In this plot, we can see a variety of patterns. Modules 15 - 20 are highly expressed across most cell populations indicating that they may contain housekeeping genes (e.g. ribosomal). Other modules are specific to a cell population or groups of cell populations. For example, module 35 is only on in population 1 while module 70 is expressed across populations 2, 3, and to some degree in population 5. The unknown T-cell population 13 has highly specific levels of modules 30. In the next section, we can look at the genes in these modules to gain insights into the biological properties of each of these cell populations.
The primary advantage of celda over other tools is that it can cluster features that are co-expressed across cells into modules. These modules are often more biologically coherent than features correlated with principal components from PCA. Below are several ways in which modules can be explored and visualized.
The function featureModuleTable can be used to get the names of all features in each module into a data.frame.
# Save to a data.frame
ta <- featureModuleTable(sce, useAssay = useAssay, altExpName = altExpName)
dim(ta)## [1] 154 80
head(ta[,"L70"])## [1] "S100A9" "S100A8" "S100A12" "RBP7" "FOLR3" "C19orf59"
The parameter displayName can be used to switch between IDs stored in the rownames of the SCE and columns of the rowData of the SCE. The the outputFile parameter is set, the table will be saved to a tab-delimited text file instead of to a data.frame:
# Save to file called "modules.txt"
featureModuleTable(sce, useAssay = useAssay, altExpName = altExpName, outputFile = "modules.txt")The modules for this model are shown below:
| L1 | L2 | L3 | L4 | L5 | L6 | L7 | L8 | L9 | L10 | L11 | L12 | L13 | L14 | L15 | L16 | L17 | L18 | L19 | L20 | L21 | L22 | L23 | L24 | L25 | L26 | L27 | L28 | L29 | L30 | L31 | L32 | L33 | L34 | L35 | L36 | L37 | L38 | L39 | L40 | L41 | L42 | L43 | L44 | L45 | L46 | L47 | L48 | L49 | L50 | L51 | L52 | L53 | L54 | L55 | L56 | L57 | L58 | L59 | L60 | L61 | L62 | L63 | L64 | L65 | L66 | L67 | L68 | L69 | L70 | L71 | L72 | L73 | L74 | L75 | L76 | L77 | L78 | L79 | L80 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CCL3 | GNLY | CTSW | NKG7 | RPS19 | MT-CO2 | MT-CO3 | DDX5 | RPL28 | RPL18A | FOS | EIF1 | JUNB | GIMAP7 | RPL13A | RPS6 | RPS2 | RPL10 | RPL13 | RPS14 | RPSA | RPS27 | LTB | PTPRCAP | MALAT1 | LDHB | IL32 | CD79B | CD37 | TUBA1B | GAPDH | PPIA | ACTG1 | CCL5 | PPBP | RGS10 | OAZ1 | TAGLN2 | MT-ND1 | MT-CO1 | ARPC3 | SH3BGRL3 | CYBA | PTMA | TMSB10 | LAPTM5 | ARHGDIB | HLA-B | CFL1 | SRGN | ACTB | TMSB4X | C9orf142 | ANXA1 | UBB | B2M | MYL12A | HLA-A | FCGR3A | IFITM2 | FAM26F | FCER1G | AIF1 | FTH1 | FCER1A | HLA-DQA1 | HLA-DPB1 | CD74 | HLA-DRA | S100A9 | LYZ | CST3 | VIM | NEAT1 | S100A4 | GSTP1 | LGALS1 | GABARAP | TYROBP | FTL |
| IGFBP7 | GZMB | CD247 | GZMA | NACA | CD52 | MT-ND4 | TSC22D3 | RPS9 | RPL12 | FXYD5 | H3F3B | TMEM66 | GIMAP4 | RPS18 | RPS3 | RPL19 | RPL11 | RPL32 | EEF1A1 | JUN | RPL21 | MYC | CXCR4 | MYLIP | IL7R | CD3D | CD79A | SNHG7 | HMGB2 | EIF4A1 | HNRNPA2B1 | CORO1A | GZMK | PF4 | TUBA4A | FKBP1A | GDI2 | PFDN5 | LSP1 | YBX1 | SERF2 | CLIC1 | HNRNPA1 | EIF3K | SNX3 | UBC | SRP14 | PSMB9 | ITGB2 | PFN1 | GMFG | APOBEC3G | HCST | RAC2 | HLA-C | HSPA8 | CALM1 | RHOC | CTSC | NCF1 | FGR | LST1 | COTL1 | CLEC10A | HLA-DQB1 | HLA-DPA1 | IRF8 | HLA-DMA | S100A8 | LGALS2 | CFP | S100A10 | ISG15 | S100A6 | GPX1 | TYMP | TSPO | FCN1 | CTSS |
| HAVCR2 | FGFBP2 | GZMM | CST7 | NAP1L1 | PPDPF | MT-CYB | TXNIP | FAU | RPL8 | CD48 | DUSP1 | ZFP36L2 | FYB | RPS8 | RPS12 | RPLP1 | RPL6 | RPLP2 | RPS4X | NPM1 | RPS3A | SIT1 | ISG20 | ATM | NOSIP | CD3E | MS4A1 | SNX2 | EIF1AY | SLC25A5 | HMGB1 | CHCHD2 | LAG3 | HIST1H2AC | CDC42SE2 | TALDO1 | ATP5C1 | SLC25A6 | ATP6V0E1 | LY6E | ARPC1B | SUPT4H1 | RPL36AL | ATP5E | UQCRH | MYL12B | PSME1 | PPP1CA | CD63 | MYL6 | CAPZB | CDC37 | ID2 | UCP2 | HLA-E | EVL | CD99 | CDKN1C | MYO1G | LYN | CD86 | IFITM3 | SAT1 | ENHO | HLA-DQA2 | HLA-DRB1 | LAT2 | LY86 | S100A12 | MS4A6A | CPVL | NFKBIA | ANXA2 | S100A11 | AP1S2 | LGALS3 | RAC1 | NCF2 | NPC2 |
| CCL4L1 | CCL4 | LYAR | PRF1 | SOD1 | ATP6V1G1 | MT-ND2 | CIRBP | UBA52 | RPL29 | APRT | ITM2B | TMEM123 | GIMAP1 | RPL10A | RPL3 | RPL15 | RPL26 | RPS16 | RPL27A | RPSAP58 | RPS27A | BIRC3 | CD69 | ANKRD44 | GIMAP5 | CD7 | TCL1A | PRKCB | MANF | TPI1 | HSP90AA1 | ENO1 | SDPR | FERMT3 | H3F3A | PRDX6 | ATP5G2 | BRK1 | RHOA | ALDOA | IFI35 | COX7C | ATP5L | GABARAPL2 | YWHAB | RBM3 | PSMB8 | CTSD | ARPC2 | TUBA1A | ARGLU1 | RNF181 | CD53 | ARL6IP5 | IFITM1 | SEPT7 | CKB | ABI3 | POU2F2 | CD300C | CFD | PSAP | DNASE1L3 | CD1C | HLA-DRB5 | EAF2 | HLA-DMB | RBP7 | CD14 | IGSF6 | AMICA1 | PRELID1 | GRN | TNFSF13B | PYCARD | CDA | BRI3 | ||
| SPON2 | HOPX | GZMH | CYTIP | PRR13 | MT-ATP6 | LIMD2 | RPS24 | GNB2L1 | GSTK1 | KLF6 | BTG2 | CITED2 | RPS5 | RPS15A | RPS15 | RPL14 | RPS28 | TPT1 | HINT1 | RPL9 | RIC3 | STK17A | CARS | C12orf57 | CD2 | IGLL5 | ARL4A | HMGA1 | PKM | HMGN1 | SLC25A3 | TSC22D1 | DNAJB6 | GSTO1 | DNAJC8 | ZFP36 | C11orf31 | CALM2 | LAMTOR4 | IRF7 | CNBP | PSMA7 | POLD4 | OST4 | C19orf43 | TPM3 | MYO1F | EMP3 | PGK1 | TMCO1 | CKLF | PLAC8 | BIN2 | AES | PDIA3 | LYPD2 | ATP1B3 | SCPEP1 | FAM49A | SERPINA1 | TIMP1 | SERPINF1 | FCGR2B | PLD4 | MEF2C | RNASE6 | FOLR3 | ALDH2 | RAB32 | MYADM | IFI6 | CEBPD | RNF130 | TKT | SLC7A7 | CTSB | |||
| CLIC3 | C12orf75 | KLRD1 | CLNS1A | SYF2 | SSR2 | KLF2 | COX4I1 | RPS11 | CD44 | IER2 | NDFIP1 | SEPW1 | RPS23 | RPL23A | RPS7 | RPL27 | RPL36 | RPL7 | EIF4A2 | BTG1 | STMN3 | ACAP1 | DNAJB1 | LCK | IGJ | HVCN1 | STMN1 | COX8A | SRSF7 | LDHA | NRGN | LIMS1 | SOD2 | YWHAH | SERP1 | COPE | RNASET2 | COX5B | MIR142 | TMA7 | COX6B1 | SEP15 | NDUFA4 | SUMO2 | COX6A1 | FLNA | PSME2 | ARF1 | ADRM1 | IL10RA | CD164 | RAP1B | RARRES3 | LITAF | VMO1 | SPN | SYNGR2 | ARRB1 | CD68 | CEBPB | LILRA4 | PPP1R14A | CCDC50 | GAPT | C19orf59 | IL8 | CD33 | PPT1 | C1orf162 | FCGRT | STX11 | AP2S1 | FPR1 | BLVRA | |||||
| XCL2 | CD8A | CD160 | CCT7 | EVI2B | TMEM14B | HNRNPDL | BTF3 | RPL23 | PPP1R15A | GPSM3 | RWDD1 | GIMAP2 | RPL18 | RPL30 | RPL7A | EEF1D | RPL22 | RPL35A | SNRPD2 | GLTSCR2 | CRIP2 | RHOH | PRDX2 | LAT | LINC00926 | SYPL1 | PRDX4 | PSMB3 | RAN | PPIB | TUBB1 | MAX | POLE4 | MRPL14 | GPX4 | COX5A | DRAP1 | TCEB2 | LAP3 | EIF3F | VAMP8 | LSM7 | SKP1 | EIF3G | GUK1 | GLIPR2 | ARPC5 | LRRFIP1 | PPP2CA | CMTM3 | PPP1R18 | HLA-F | TRAF3IP3 | XBP1 | ADA | LYST | TCF7L2 | SPI1 | STXBP2 | PHACTR1 | CSF3R | CD302 | CORO1B | RHOG | CTSH | MNDA | ATP6V0B | MAFB | RHOB | ||||||||||
| TTC38 | KLRG1 | FCRL6 | ATP1A1 | YPEL5 | C19orf24 | ANAPC16 | PABPC1 | EEF2 | PLP2 | EDF1 | SOCS3 | DGCR6L | RPL5 | RPS25 | RPL35 | RPL24 | C6orf48 | RPL34 | SELL | FOXP1 | TNFRSF4 | BIN1 | CD27 | CD3G | BANK1 | ADK | CHTF8 | GNB2 | HNRNPK | SRSF3 | GNG11 | AP3S1 | RAB10 | SPINT2 | ATP5D | WDR83OS | SH3BGRL | BST2 | TMEM179B | UBE2D3 | ARL6IP4 | RASGRP2 | COX6C | TMBIM6 | TRAPPC1 | EFHD2 | ATP5B | CALM3 | PDHB | FYN | SCP2 | DHRS7 | IL2RG | ANXA6 | SH3BP1 | LYL1 | IFI30 | ASAH1 | PLBD1 | LILRB4 | TMEM14C | MT2A | BLVRB | FGL2 | RGS2 | FCGR2A | PLAUR | ||||||||||||
| AKR1C3 | ZAP70 | XCL1 | ARID4B | DAZAP2 | MPC2 | VAMP2 | EIF3H | NBEAL1 | KHDRBS1 | UQCR11 | TRADD | TMEM173 | RPS10 | RPL31 | RPL4 | RPL38 | C21orf33 | RPL17 | PEBP1 | RP11-796E2.4 | RP11-706O15.1 | OCIAD2 | LEPROTL1 | CD8B | VPREB3 | SMIM14 | SP140 | UQCRFS1 | MIF | PARK7 | RGS18 | CTSA | PTPN18 | H2AFJ | HIGD2A | FIS1 | BLOC1S1 | SF3B5 | PLSCR1 | ERP29 | NEDD8 | MX1 | PTPRC | SAP18 | CAP1 | PLEK | MSN | FKBP8 | TPST2 | ETHE1 | TMEM9B | TMEM50A | CCND3 | YWHAQ | GUSB | STX7 | APOBEC3A | BID | ASGR1 | RAB31 | DHRS4L2 | CSTB | NUP214 | KLF4 | LINC00936 | TMEM176B | CTSZ | ||||||||||||
| PRSS23 | APMAP | KLRC1 | THYN1 | SP110 | TMED4 | UBE2D2 | PNRC1 | EIF3L | SURF1 | SSR4 | CHURC1 | GBP1 | RPS13 | RPS20 | RPL37A | RPS4Y1 | C1orf228 | RPL36A | CMPK1 | KCNQ1OT1 | KRT1 | FAIM3 | PIK3IP1 | OPTN | FCER2 | MYCBP2 | HIST1H4C | ANAPC11 | SRSF2 | PSMB1 | CLU | NCOA4 | GLUL | TPM4 | FOSB | PSMB7 | RNH1 | GLRX | TMEM205 | GTF3A | NDUFA13 | MAF1 | PTGES3 | JTB | LCP1 | CD300A | HN1 | ATRAID | PPP6C | PTGER2 | ZNF207 | SIGIRR | SRP9 | ALOX5AP | FAM110A | YBX3 | MS4A7 | WARS | TNFAIP2 | CD4 | C10orf54 | IL1B | ODF3B | CNPY3 | VSTM1 | TBXAS1 | |||||||||||||
| GPR56 | SAMD3 | RP11-347P5.1 | CCDC12 | MRPL21 | UXT | UBXN1 | ZFAS1 | STAT1 | MT-ND5 | STAT3 | DNAJA2 | RPLP0 | RPS29 | RPL37 | TOMM7 | COMMD6 | PPA1 | LBH | RGCC | ITM2A | PDLIM1 | CYB561A3 | PCNA | PGAM1 | HNRNPC | H2AFZ | CD9 | RHEB | RNPEP | COMMD3 | NDUFA11 | SRSF9 | PTPN6 | SERPINB1 | TLN1 | PCBP2 | ATP5G3 | RCSD1 | ACTR3 | EID1 | HCLS1 | AOAH | DOK2 | M6PR | ATXN10 | CDC42EP3 | NDUFS2 | SH3KBP1 | FUS | TMED9 | ASB8 | SNX10 | HCK | NINJ1 | BST1 | UBE2Q1 | ANXA5 | PGD | SMCO4 | CAPG | RETN | G0S2 | |||||||||||||||||
| HBA1 | NCR3 | TTC3 | STUB1 | TMEM165 | PNISR | RPL39 | HPCAL1 | PCBP1 | TNFAIP3 | SORL1 | EEF1B2 | RPS26 | EIF3D | SRSF5 | ST13 | ANKRD12 | GPR183 | TCF7 | RORA | P2RX5 | MBD4 | NME1 | TIMM13 | SPCS1 | GHITM | ACRBP | ODC1 | FAM45A | ETFA | MCL1 | ARF5 | ZNF706 | ATP6V1F | ANXA4 | HSP90AB1 | UBL5 | DDT | COX7A2 | NDUFB8 | ATP5F1 | CX3CR1 | PTP4A2 | C1orf43 | TMX2 | GYG1 | PSMC4 | RASSF5 | DUSP2 | SUN2 | UTRN | NOTCH2NL | RP11-290F20.3 | CPPED1 | FCGR1A | H2AFY | MSRB1 | TGFBI | LGALS9 | MTMR11 | C5AR1 | |||||||||||||||||||
| PTGDS | CHST12 | RBM4 | C19orf70 | LYPLA1 | RBMX | ZFP36L1 | RBMS1 | ICAM3 | UBAC2 | PTGER4 | RPS21 | ZFAND1 | MZT2B | CCNI | CCDC109B | SLC2A3 | RCN2 | SPOCK2 | SH2D1A | BLK | IFT57 | SNRNP25 | MOB1A | C1QBP | PRDX1 | MMD | RBBP6 | MPP1 | CIR1 | LAMTOR1 | PPP4C | GNG5 | UBE2L6 | LAIR1 | ATP5O | COX7A2L | HERPUD1 | CIB1 | C9orf16 | VPS28 | MIR4435-1HG | CAPZA2 | GBP2 | RPL7L1 | IFI44 | SRP72 | SYTL1 | GYPC | TAP1 | TMEM18 | BTK | LILRA5 | DUSP6 | QPCT | PGLS | VCAN | NUDT16 | SAT2 | CSTA | MPEG1 | |||||||||||||||||||
| PTPN7 | DSCR3 | TNFRSF14 | PCSK7 | SON | APEX1 | USP3 | NDUFB11 | HSPB1 | MT1X | SNHG8 | FBL | RPL41 | AIMP1 | ARHGAP15 | PPM1K | CCR7 | ETS1 | FCRLA | PPAPDC1B | AHCY | CYC1 | TRMT112 | SNRPB | CA2 | AMD1 | PLEKHO1 | ADIPOR1 | SCAND1 | EIF5 | VASP | POLR2L | DDAH2 | HNRNPA0 | CRIP1 | PAPOLA | TMEM59 | UFC1 | DBI | ACTN4 | RAB8A | VPS29 | CLEC2B | SMIM12 | ZFAND6 | RASAL3 | PPP2R5C | BTN3A2 | INSIG1 | UNC93B1 | PILRA | TESC | PID1 | CARD16 | ID1 | PRAM1 | IFNGR2 | CYBB | ||||||||||||||||||||||
| TIGIT | BAZ1A | POLR2I | TBC1D10C | EIF2S3 | ACTR1B | CHMP4A | MRPS33 | RCBTB2 | EEF1G | EBPL | CUTA | TNFAIP8 | ARID5B | AQP3 | CDC25B | MZB1 | NAT9 | MCM5 | SNF8 | ERH | UBE2I | PTCRA | GRAP2 | MTHFD2 | FDFT1 | GNAS | LAMTOR5 | RBX1 | SEC11A | PARP14 | ANP32B | ATP5H | RTFDC1 | HNRNPF | ARF6 | DYNLL1 | ASCL2 | IDH2 | MKRN1 | EMG1 | FLOT1 | PMAIP1 | MAEA | DDIT4 | PRMT2 | CUX1 | SCIMP | LRRC25 | SLC16A3 | CXCL2 | CASP1 | CD1D | APLP2 | SLC11A1 | |||||||||||||||||||||||||
| PRR5 | MRPL18 | ARF4 | FAM65B | MED30 | SSU72 | RP11-51J9.5 | HNRNPH3 | EIF3E | PCNP | PPP3CC | FLT3LG | BCL11B | CD72 | RBM5 | FABP5 | FIBP | CNN2 | SEC61B | SPARC | RSU1 | SNN | GRSF1 | HSD17B11 | REEP5 | RGS19 | CASP4 | LMO4 | ATP5A1 | NDUFB10 | LSM10 | C11orf58 | HNRNPM | PSMB10 | MPST | CLTB | MYH9 | DPM1 | PSTPIP1 | RAB11B | RAB37 | TERF2IP | BUB3 | UBLCP1 | ALOX5 | LILRB2 | CTSL | EREG | ARRB2 | NCOR2 | JUND | CLEC7A | ||||||||||||||||||||||||||||
| KLRB1 | SMIM7 | N4BP2L1 | PPP1R2 | RP11-349A22.5 | RAP1A | IL27RA | EPB41L4A-AS1 | RSL1D1 | MZT2A | LY9 | TRAT1 | GATA3 | TSPAN13 | TPD52 | PTTG1 | TMEM208 | DAD1 | NDUFA1 | MYL9 | NT5C3A | TAX1BP3 | ILK | CHMP1B | ATP5EP2 | RTN4 | NOP10 | SKAP2 | NDUFB9 | UQCR10 | RPL22L1 | RBM39 | HNRNPA3 | PRDX5 | TMEM140 | PSMA4 | NMI | VPS25 | LINC00152 | NT5C | SEPT1 | HSPA5 | CDIP1 | HHEX | C19orf38 | NAGA | TNFSF10 | GSN | C4orf48 | SULT1A1 | ||||||||||||||||||||||||||||||
| MATK | TRIM22 | GMPR2 | NCOR1 | KIAA0040 | POLR1D | PRMT10 | IMPDH2 | CCNL1 | DDX24 | RP1-313I6.12 | PRKCQ-AS1 | SIRPG | HLA-DOB | ITM2C | MCM7 | MX2 | SERBP1 | ABRACL | GP9 | R3HDM4 | MFSD1 | CNDP2 | ERGIC3 | POLR2J | CHCHD10 | C20orf24 | EPN1 | C19orf53 | ATP5J2 | ZC3H15 | CDC42 | EIF3I | CSNK2B | OASL | MPC1 | CRELD2 | DNAJC1 | ATF6B | ARL4C | WIPF1 | PPM1N | SYK | SLC31A2 | GNAI2 | RASSF4 | AGTRAP | GNS | ||||||||||||||||||||||||||||||||
| IL2RB | GRK6 | C10orf32 | SP100 | IFI44L | UBE2K | TAF1D | NSA2 | TSTD1 | PITPNA-AS1 | LEF1 | GPR171 | SPIB | CXXC5 | FEN1 | TXNDC17 | TUBB | MDH2 | F13A1 | NGFRAP1 | MARCH2 | HIST1H2BK | IDH3G | SUMO3 | NDUFB5 | TMEM219 | U2AF1 | MRPL23 | LSM2 | YWHAZ | RABAC1 | NDUFB2 | UBE2B | MOB2 | SMARCA4 | FDPS | TECR | ARPC5L | CD83 | HMOX1 | DYNLT1 | ENTPD1 | GCA | ADAP2 | ||||||||||||||||||||||||||||||||||||
| S100B | C1orf63 | MESDC2 | MEAF6 | IGBP1 | LGALS3BP | ZNF331 | ILF3-AS1 | SBDS | NUP54 | MAL | AC092580.4 | PKIG | MAP3K8 | MAD2L1 | TIMM8B | SUB1 | PSMB6 | TMEM40 | CD82 | PARVB | THOC6 | MT-ND3 | DBNL | OAS1 | CLTA | ISCU | ATP5I | EIF1B | C4orf3 | ENSA | RAB7A | RNF149 | CMC1 | CTD-2035E11.3 | GNG2 | SEPT6 | LAMP1 | IFIT2 | LILRA3 | NAAA | OSM | C20orf27 | |||||||||||||||||||||||||||||||||||||
| CD320 | PPCS | POLR3K | RNPS1 | METTL9 | CAMK2G | YME1L1 | PSIP1 | CHI3L2 | SUSD3 | BLNK | IL4R | TYMS | MRPL28 | C14orf166 | ATP5J | TREML1 | PTTG1IP | CORO1C | POP7 | OXA1L | SNX17 | ATG3 | NDUFS7 | MORF4L1 | NDUFA2 | EZR | SEPT9 | TMEM258 | FAM49B | SQRDL | RAB9A | RAB27A | PHF14 | PIM1 | ARL6IP1 | SIGLEC10 | CSF1R | EPSTI1 | SULF2 | SCO2 | |||||||||||||||||||||||||||||||||||||||
| SH2D2A | ADAR | TLE4 | KRT10 | MRFAP1 | TMC8 | DDX18 | AAK1 | KIAA0125 | SWAP70 | EZH2 | SPCS3 | CYCS | UQCRQ | ITGA2B | ACTN1 | H1F0 | LMNA | TMEM147 | EIF6 | CD55 | AHNAK | NDUFS5 | ALKBH7 | DECR1 | PAIP2 | RBM8A | OSTF1 | UPP1 | MRPL19 | RRAGC | TMEM109 | ADD3 | MRPS6 | RELT | APOBEC3B | OAZ2 | MGST2 | NAPRT1 | |||||||||||||||||||||||||||||||||||||||||
| PLEKHF1 | DNAJC15 | MORF4L2 | EIF4B | NDUFA3 | ADSL | NDNL2 | LDLRAP1 | POU2AF1 | NXT2 | KIAA0101 | EIF3A | XRCC6 | CALR | CMTM5 | HIST1H1C | SOX4 | SLC39A3 | C1orf86 | XAF1 | MARCKSL1 | ZNHIT1 | NHP2L1 | RNF7 | MRPL43 | DNAJA1 | AP2M1 | ARPC4 | MGAT1 | DHX36 | FAM105A | MAPK1IP1L | HDAC1 | LSM14A | KYNU | GPBAR1 | HSBP1 | IER3 | EIF4EBP1 | |||||||||||||||||||||||||||||||||||||||||
| ZNHIT3 | SF3A1 | ARHGEF1 | CMTM7 | CCT2 | TTC39C | AL928768.3 | HELQ | RRM2 | ISOC2 | EIF4G2 | PSMD8 | CLDN5 | RUFY1 | DAPP1 | C14orf2 | WAS | ENY2 | VAMP5 | SMDT1 | MRPL54 | RNASEH2B | VAPA | IRF1 | ARHGDIA | MMP24-AS1 | GFER | CYB561D2 | ANXA2R | RNF167 | PIK3AP1 | RXRA | ATP6V0D1 | GMPR | NAGK | |||||||||||||||||||||||||||||||||||||||||||||
| PRKAR1A | SRSF11 | SQSTM1 | FAM107B | MGAT4A | RP5-887A10.1 | CENPN | GGCT | ATPIF1 | AURKAIP1 | ERV3-1 | RIOK3 | PQBP1 | PFDN2 | COX17 | SDCBP | CCDC85B | VDAC1 | YPEL3 | C9orf78 | TMEM230 | TBCB | ANAPC13 | CXXC1 | CELF1 | HMOX2 | C19orf66 | KIAA0930 | OSCAR | EIF4E2 | SRA1 | |||||||||||||||||||||||||||||||||||||||||||||||||
| APBB1IP | TAF7 | DPP7 | FGFR1OP2 | OXNAD1 | TNFRSF13B | GMNN | YIF1B | SPCS2 | MTDH | TUBA1C | ZNF263 | DNAJC7 | ACTR2 | LYSMD2 | SAMHD1 | EIF3M | HADHA | CCT6A | DYNLRB1 | KRTCAP2 | BAX | PLIN2 | STX18 | DTNBP1 | RPA2 | SLC9A3R1 | HES1 | LILRB1 | ZYX | MIR24-2 | |||||||||||||||||||||||||||||||||||||||||||||||||
| IKZF1 | KMT2E | FNBP1 | PIM2 | NUCB2 | TNFRSF17 | TK1 | HAUS4 | PSMC5 | PSMA5 | BBC3 | DERA | BNIP3L | ERP44 | GRB2 | SDHB | CSDE1 | PSMG2 | DRAM2 | SELK | PDCD6 | SELT | IFIT1 | PGM1 | BAZ2A | NDUFA5 | TSC22D4 | HES4 | TNFRSF1B | NR4A1 | ||||||||||||||||||||||||||||||||||||||||||||||||||
| SNAP23 | TAPSAR1 | MGST3 | DNAJB9 | RGL4 | GINS2 | HDGF | RANBP1 | H2AFV | PLA2G12A | PICALM | NENF | TWF2 | LSM6 | LAMTOR2 | SF1 | ETFB | CAPZA1 | PYURF | VDAC2 | TMBIM4 | ARRDC1 | DYNLL2 | HSH2D | DENND2D | BZW1 | CAMK1 | CAPNS1 | CECR1 | |||||||||||||||||||||||||||||||||||||||||||||||||||
| BEST1 | TCEA1 | NUCB1 | DEGS1 | CD40LG | ZWINT | PGP | DUT | TOMM22 | PGRMC1 | APP | AKR1A1 | EMC7 | ZFAND5 | NDUFS6 | TRA2B | COX14 | SMAP2 | S1PR4 | FAM96B | PSMD4 | IFI27 | APOBEC3C | TBCC | DEF6 | CD300E | UBE2D1 | MIDN | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| TAOK3 | UBXN4 | TGOLN2 | XXbac-BPG299F13.17 | TRABD2A | BIK | REEP3 | TBCA | NDUFC2 | FHL1 | MID1IP1 | NDUFS3 | CHMP4B | CD97 | NDUFV2 | PSMB4 | PLEKHJ1 | BCAP31 | LMAN2 | WDR1 | RSAD2 | ZBP1 | PBXIP1 | C5orf56 | CD300LF | THEMIS2 | GRINA | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| CARD8 | MED10 | PTPN2 | PARP1 | CD6 | CCNA2 | LMNB1 | HSP90B1 | LSM4 | SLC40A1 | HADHB | BAG1 | SMS | TGFB1 | UQCRC2 | PSMD9 | TMEM256 | ICAM2 | TAPBP | NDUFB7 | ATP5SL | ABHD14B | RSRC2 | ARHGEF40 | NANS | PTPRE | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| KLF3 | SEC62 | EVI2A | TOB1 | SH3YL1 | BIRC5 | ECH1 | PDIA6 | PNMA1 | TREX1 | LRPAP1 | CAT | GLIPR1 | EIF2S2 | ADI1 | BLOC1S2 | TMED2 | SUMO1 | RAB5C | COMMD10 | RGS1 | HDDC2 | CXCL16 | ATOX1 | CARS2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| LIPA | CCT4 | TNIP1 | SVIP | CAMK4 | XRCC5 | TUFM | TPM1 | PARL | SHKBP1 | LTA4H | RILPL2 | MINOS1 | NAA10 | WBP2 | SSBP1 | GADD45B | CHMP2A | CD38 | IFRD1 | SMARCE1 | TPPP3 | MAPKAPK3 | DNTTIP1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZNF394 | KIF5B | EMC10 | TMEM261 | SATB1 | HNRNPU | P4HB | CCDC69 | TRAPPC2L | FAM32A | SSR3 | MRPL20 | TXN2 | EIF5B | ANXA11 | GADD45GIP1 | SRI | ITGB7 | DDX6 | LILRA2 | CBR1 | MYO9B | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| CLK3 | STK4 | MTIF3 | BEX4 | FHIT | NCL | DEK | HTATIP2 | C7orf50 | PMVK | UQCRC1 | TOMM20 | MRPL41 | MMADHC | CD47 | ACP1 | SIVA1 | DNAJC19 | SDF4 | EMR2 | TNFRSF1A | UBE2R2 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| SNX9 | POLR3GL | FRG1 | CYLD | USP10 | NHP2 | UBE2L3 | ARHGAP4 | BNIP2 | YWHAE | SERPINB6 | C19orf60 | NDUFA7 | SLTM | NDUFB4 | PSMA1 | RALY | RNF139 | DERL1 | MS4A4A | ADRBK1 | LACTB | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| OCIAD1 | CSRNP1 | ASF1A | UXS1 | PA2G4 | C19orf10 | MRPL40 | MRPS23 | PARVG | LSMD1 | IDS | RNF187 | HAGH | NDUFA12 | RPS19BP1 | BSG | FKBP11 | PRPF31 | CTD-2006K23.1 | TCIRG1 | CDKN1A | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| MPHOSPH8 | SLC38A1 | CISH | NOL7 | MYEOV2 | MTHFS | FAM173A | ACAA1 | PHPT1 | ATF4 | OLA1 | COQ7 | PSMA2 | RPN2 | TCF25 | SKAP1 | SPSB3 | C1QA | NAMPT | CREG1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| CRBN | CDKN1B | PASK | MDH1 | PSMB2 | DNAJC4 | HBP1 | NFKBIZ | NDUFB1 | DGUOK | MRPL52 | FBXW5 | SSB | BANF1 | POLR2G | NAP1L4 | EBP | ZNF703 | ZNF106 | FUOM | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| TMEM243 | LSM5 | TNFRSF25 | CCT8 | MRPL51 | PRPF8 | TIMMDC1 | VMP1 | SPG21 | MRPS21 | MPG | PNKD | CELF2 | HMGN3 | NDUFS8 | YTHDF2 | GCHFR | CEBPA | AP2A1 | FBP1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| PPIG | G3BP1 | CCDC104 | HSPE1 | TXN | ZNF581 | ABTB1 | CYTH4 | IFI27L2 | CAMLG | NDUFA9 | TINF2 | RPS27L | KARS | BUD31 | STT3B | MFSD10 | ALDH3B1 | C11orf21 | PDXK | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| MED4 | MPHOSPH10 | INTS12 | PHB | SNRPD1 | ACAP2 | ZNF511 | MTPN | VIMP | PPP1CC | EMC6 | SMARCB1 | NUDC | SHISA5 | RNF213 | REXO2 | UBA2 | C1QB | NRROS | PLIN3 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| UGP2 | THAP7 | NELL2 | CCT3 | SNRPC | RPL26L1 | TPP1 | MYD88 | COMT | NDUFAF3 | MVP | SET | CAPN2 | IMP3 | ATP6AP2 | RBL2 | ALDH9A1 | HCAR3 | MANBA | ATF3 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| DCK | ID3 | AKTIP | EWSR1 | MRPS34 | TRMT1 | IFNGR1 | AKIRIN2 | MAP2K3 | ANXA7 | NDUFC1 | LRCH4 | IK | C16orf13 | MAP1LC3B | NOP58 | ORAI1 | CXCL3 | MBOAT7 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| WTAP | SLC25A45 | LINC00176 | HNRNPR | PSMD7 | FAM195A | DOCK2 | COMMD9 | CHCHD5 | HAX1 | COX7B | SIAH2 | EIF4H | CDC42SE1 | RER1 | CD96 | SURF4 | PRKCD | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ORMDL1 | RP11-489E7.4 | CD28 | CBX3 | ILF2 | FAM192A | IFIT3 | STX10 | MBNL1 | MRPS16 | UBE2J1 | HMHA1 | SPAG7 | FMNL1 | SH3BP5 | B3GAT3 | SGK1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| CCDC107 | GPATCH4 | SCGB3A1 | CACYBP | CCT5 | CINP | ERICH1 | HM13 | NR4A2 | ZNRD1 | C12orf10 | JAK1 | SDHC | DCTN3 | EMB | PLGRKT | RAB34 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| TAGAP | TNRC6C | OSTC | ATP5G1 | ARPC1A | VKORC1 | MRP63 | MKKS | SMCHD1 | TRAM1 | EMC4 | PCMT1 | PRKCH | GBP5 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| PDCD4 | RP11-291B21.2 | THOC7 | SHFM1 | RFXANK | ECHDC1 | SF3B2 | STRA13 | TANK | COPS6 | APH1A | CDKN2D | NUDT16L1 | PRKD2 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| TCP1 | HAPLN3 | PRMT1 | ANP32A | PCGF5 | SLA | RAD23A | GPI | STX5 | RAB2A | ARL5A | RBCK1 | ODF2L | TRPV2 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| CCNG1 | HSPD1 | NDUFAB1 | SNX5 | SNAPIN | TRAPPC6A | ANAPC15 | THRAP3 | DSTN | SF3B14 | LAPTM4A | C14orf1 | STAMBP | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| NONO | MATR3 | NUTF2 | ERCC1 | RTN3 | PHB2 | KXD1 | FAM96A | CCDC115 | PIN1 | CSK | CYB5B | MLLT11 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| IL16 | EIF5A | HSD17B10 | NCKAP1L | PEPD | RSL24D1 | MRPL16 | ISCA2 | RHOF | SRRM2 | CHMP5 | PSMD5-AS1 | SYNE1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| LUC7L3 | PDCD5 | POLR2E | IL10RB | STK38 | STK17B | SLIRP | AAMP | EAPP | CFLAR | SSNA1 | ORMDL3 | TFDP2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| FNTA | UBE2N | PPP1R7 | PFKL | C9orf89 | LINC00493 | UQCRB | CHPT1 | COPS5 | CCNDBP1 | WASF2 | CCDC167 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| N4BP2L2 | MAGOH | SEC11C | UNC119 | CDV3 | GGNBP2 | CTNNBL1 | REL | SNX6 | COPZ1 | KDELR2 | GIMAP6 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| TPR | PPHLN1 | SNRPD3 | ATP2B1 | HEXB | NSMCE1 | FAM50A | ITGAE | SARS | TMED10 | COX6A1P2 | PYCR2 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| G3BP2 | VDAC3 | PSMC3 | RABGAP1L | ZNF524 | CWC15 | KDELR1 | USF2 | FBXO7 | DDOST | SELPLG | RNF115 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ELOVL5 | NUCKS1 | OTUB1 | NFYC | VMA21 | PDCD2 | IRF2BP2 | ITGA4 | GNAI3 | SDHD | WSB1 | PTPN4 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| SCAF11 | AIP | MRPS7 | MTSS1 | PLD3 | MRPS18B | PSMD11 | NSFL1C | CCM2 | MRPL34 | C7orf73 | SYNE2 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| PRPF38B | RBM17 | PSMA3 | RALB | ATP6V1B2 | PRRC2C | HARS | GRPEL1 | SNHG15 | TMEM160 | CMTM6 | TMEM87A | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| SLC3A2 | GTF2A2 | CDK2AP2 | DPEP2 | CHIC2 | CNPY2 | SEC13 | SAMD9L | ABT1 | ZMAT2 | SUCLG1 | RBM38 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| DPY30 | STRAP | SRM | SNAP29 | OAS3 | RP11-1143G9.4 | FAM89B | CCS | RNF5 | IFI16 | PRDX3 | THAP11 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| CYTH1 | SNRPE | UFD1L | GCH1 | CEBPG | NECAP2 | NIT2 | ACO2 | FAM162A | TRAPPC3 | MIEN1 | OBFC1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| CCDC59 | EPC1 | POMP | GINM1 | CBWD1 | DDX17 | FLYWCH2 | MRPL12 | CCNH | C17orf62 | RNASEH2C | CD59 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| WHSC1L1 | RNF126 | MRPL11 | RIPK2 | TXNL1 | PTOV1 | STARD7 | KTN1 | PDAP1 | SH3GLB1 | CCND2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| SF3B1 | ADH5 | GTF3C6 | UTP6 | H1FX | FH | TRIM38 | LCP2 | RNF166 | AKAP13 | PHACTR4 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| RBM23 | PPM1G | SDF2L1 | PACSIN2 | MED28 | TRA2A | BCL7B | TXNDC12 | ELOF1 | GOLT1B | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| BBX | SNRPF | PSMC2 | LINC01003 | SRRM1 | COMMD5 | GLUD1 | IDH3B | NDUFA6 | MTFP1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| NAA38 | WBSCR22 | NDUFB3 | RBM25 | CKS1B | ELF1 | MRPL55 | ATP6V1E1 | CXCR3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| IRF9 | MAPRE1 | TUBB4B | DCXR | DUSP22 | COMMD8 | POLR2F | FKBP2 | GALM | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| MAT2B | METTL23 | XRN2 | SFPQ | PTPN1 | DCTN2 | ARHGAP30 | CAST | ACD | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| GPBP1 | SNRPA1 | CARHSP1 | CHMP3 | SRSF6 | TMEM126B | DDX46 | SFT2D1 | TNFRSF18 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| CHD2 | AKR1B1 | C11orf48 | DAP3 | TEN1 | NME4 | COMMD7 | CISD3 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| EGLN2 | PPP1R12A | NDUFS4 | MRPS15 | COMMD4 | CCZ1 | PPP1R11 | NDUFB6 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ARL2BP | STOML2 | TCEB1 | EIF2A | TSSC1 | RPF1 | AUP1 | UBE2F | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| RBBP7 | MRPL9 | IMP4 | MFNG | CWC25 | YY1 | PPP2R1A | PSENEN | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| RPAIN | MFF | MRPS18C | PSMF1 | CEPT1 | RAB11A | PSMD13 | FAM204A | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| NKTR | PITHD1 | SEC61G | PHF5A | CHD9 | KPNB1 | PET100 | SCAMP2 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| PNN | ANAPC5 | ZDHHC12 | TAF9 | CD81 | PDCL3 | ITGB1BP1 | NAPA | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| DARS | CDC123 | PTBP1 | NDUFA10 | RFC2 | GLG1 | TEX264 | OS9 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| NCBP2 | SNRPG | HNRNPD | SNW1 | ACP5 | METTL5 | CSNK1A1 | ASNA1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| AATF | LYRM4 | NUDT1 | FDX1 | UROS | LSM1 | MLX | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| RSF1 | GLRX3 | PAK2 | ECHS1 | MAGED2 | GSDMD | MEA1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| CHCHD7 | SRP19 | SEPT2 | MT-ND4L | ZCRB1 | RTF1 | UBE2A | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| AL592183.1 | C17orf89 | AK2 | DHPS | CCDC53 | NME3 | EMC3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| BTF3L4 | SNRNP70 | RPS6KB2 | PMF1 | APOL3 | TAF12 | STK10 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| FLI1 | PSMD6 | TIMM17A | SUCLG2 | SHARPIN | PUF60 | TPGS1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| NR3C1 | NUDT5 | DCTPP1 | PRKCSH | PPA2 | IAH1 | TMEM141 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ITSN2 | DDX39A | HMGN2 | BABAM1 | ZFAND2B | PDLIM2 | PKN1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZC3HAV1 | HSPA9 | MTCH2 | PABPC4 | SLFN5 | MGMT | UBE2E3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| MTERFD2 | VCP | MRPL47 | ATG12 | CCDC90B | MRPS12 | CCDC124 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| GRAMD1A | SRSF1 | UQCC2 | NXT1 | TMBIM1 | BRD2 | COA3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZNF24 | EIF4A3 | MRPL36 | PPIE | TSNAX | CANX | RPN1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| USE1 | CLPP | NKAP | PIH1D1 | SEPHS2 | NAA20 | RAB4A | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| USP16 | TTC1 | DAZAP1 | UBE2E1 | SLC44A2 | NOP56 | TADA3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ARFGAP2 | IDI1 | MRPL15 | QARS | ASCC2 | FUNDC2 | MRPS11 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| FRG1B | MRPS26 | TIMM10 | GID8 | BFAR | MTCH1 | DNAJC2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| TIMM9 | HNRNPUL1 | DTYMK | MRPL3 | CCDC25 | TXNL4A | FKBP5 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| MARCH7 | HINT2 | TOP1 | USP15 | NRBP1 | PTRHD1 | RRP7A | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| LENG1 | SLBP | PSMG3 | MRPL32 | PHF3 | SLC25A11 | DNAJB11 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| CYB5A | PSMD14 | DLD | MAPKAPK5-AS1 | SYS1 | COX16 | YIPF3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| SMIM19 | FKBP3 | DCPS | PTP4A1 | PPM1B | NDUFV1 | DIAPH1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| LINC-PINT | SDHA | TFDP1 | EIF1AX | YAF2 | ARHGAP9 | HNRNPH2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| OARD1 | NCBP2-AS2 | RP11-139H15.1 | LNPEP | LSM3 | PFDN6 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| TTC14 | SNRNP40 | FXR1 | PEX16 | GPAA1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| TMEM242 | RAD21 | EIF2B1 | SRSF4 | MAPK1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| DDIT3 | VBP1 | ESD | PHF11 | ACAA2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| SAFB2 | C14orf119 | SF3A3 | VTI1B | PANK2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| CEBPZ-AS1 | FAM177A1 | MCTS1 | AKR7A2 | IFNAR2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| BRD9 | HPRT1 | CCNC | URM1 | MAD2L2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| PRDM2 | SLC25A39 | COA6 | AHSA1 | CCDC88C | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| CD84 | RPA3 | PHYKPL | JAGN1 | RAB4B | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| EGR1 | SNRPA | CAMTA1 | GRHPR | G6PD | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| GLRX5 | C19orf25 | ROMO1 | ARFGAP3 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| TIAL1 | CHCHD1 | PFDN1 | BCCIP | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| EXOSC8 | EIF4E | STX8 | RABL6 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| LBR | C11orf83 | RNPEPL1 | SMC4 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ILF3 | SREK1 | TMEM248 | MVD | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| SAP30BP | ANKRD11 | GGA1 | MRPS28 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ACTR10 | TMEM138 | IFT20 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| NUDT21 | PTGES2 | AKAP9 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| UBE2G1 | LGALS8 | C18orf32 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| LARP7 | MLEC | BRMS1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| PPIH | C5orf15 | CAPN1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| EBNA1BP2 | FEM1B | DDRGK1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| VOPP1 | RAD23B | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| GNL3 | WDR33 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| CISD2 | WDR61 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| SSRP1 | SUGT1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| PDIA4 | LAGE3 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| SYNCRIP | RBBP4 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| C1orf35 | ESYT1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| MAP7D1 | APOA1BP | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| DNAJC9 | MIF4GD | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HAUS1 | CFDP1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ILKAP | UBE2J2 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| RPUSD3 | MRPS5 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| CDKN2AIPNL | SRPK2 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| POLD2 | FAM200B | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HNRNPAB | C17orf49 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| TMEM106C | NUBP2 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| CBX1 | PRKAG1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| NAA50 | SURF2 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| MCM3 | SSSCA1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| CISD1 | EI24 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| TFPT | CSNK1D | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| UBA5 | DCTD | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| LMAN1 | PEX2 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| PGRMC2 | PNKP | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| C19orf48 | TMEM70 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| C14orf142 | JMJD6 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| PAICS | DCAF5 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| RCE1 |
If you want to quickly find which module a particular feature was assigned to, the featureModuleLookup function can be used. Here will will look up a marker gene for T-cells called “CD3E”:
mod <- featureModuleLookup(sce, feature = c("CD3E", "S100A8"))
mod## CD3E S100A8
## 27 70
The function moduleHeatmap can be used to view the expression of features across cells for a specific module. The featureModule parameter denotes the module(s) to be displayed. Cells are ordered from those with the lowest probability of the module on the left to the highest probability on the right. Similarly, features are ordered from those with the highest probability within the module on the top to the lowest probability on the bottom.
moduleHeatmap(sce, featureModule = 27, useAssay = useAssay, altExpName = altExpName)
The parameter topCells can be used to control the number of cells included in the heatmap. By default, only the 100 cells with the lowest probabilities and the 100 cells with the highest probabilities for each selected module are included (i.e. topCells = 100 by default). To display all cells, this parameter can be set to NULL:
moduleHeatmap(sce, featureModule = 27, topCells = NULL, useAssay = useAssay, altExpName = altExpName)
Note: Multiple modules can be displayed by giving a vector of module indices to the parameter featureModule. If featureModule is not specified, then all modules will be plotted.
The function plotDimReduceModule can be used visualize the probabilities of a particular module or sets of modules on a reduced dimensional plot such as a UMAP. This can be another quick method to see how modules are expressed across various cells in 2-D space. As an example, we can look at module 70 which contained S100A8:
plotDimReduceModule(sce, modules = 70, useAssay = useAssay, altExpName = altExpName, reducedDimName = "celda_UMAP")
Similarly, multiple modules can be plotting in a grid of UMAPs:
plotDimReduceModule(sce, modules = 70:78, useAssay = useAssay, altExpName = altExpName, reducedDimName = "celda_UMAP")
In this grid, we can see that module 70 (which has high levels of S100A8 and S100A9) is highly expressed in cell populations 2 and 3, module 71 (which contains CD14) can be used to identify all CD14+ monocytes, module 72 (which contains CST3) is expressed across both CD14 and FCGR3A (CD16) expressing monocytes, and module 73 (which contains CD4) is expressed broadly across both monocytes and dendritic cells as well as some T-cell populations. If we were interesting in defining transcriptional programs active across all monocytes, we could examine the genes found in module 72. If we were interested in defining transcriptional programs for all CD14+ monocytes, we could examine the genes in module 71. These patterns can also be observed in the Probability Map
In the celda probability map, we saw that the unknown T-cell population 13 had high levels of module 30. We can examine both module heatmaps and module probability maps to further explore this:
moduleHeatmap(sce, featureModule = 30, useAssay = useAssay, altExpName = altExpName)
plotDimReduceModule(sce, modules = 30, useAssay = useAssay, altExpName = altExpName, reducedDimName = "celda_UMAP")
Module 30 has high levels of genes associated with proliferation including HMGA1, STMN1, PCNA, HMGB2, and TUBA1B. We can therefore re-label these cells as “Proliferating T-cells”.
In addition to examining modules, differential expression can be used to identify potential marker genes up-regulated in specific cell populations. The function findMarkerDiffExp in the singleCellTK package will find markers up-regulated in each cell population compared to all the others.
# Normalize counts (if not performed previously)
library(scater)
sce <- logNormCounts(sce, exprs_values = useAssay, name = "logcounts")
# Run differential expression analysis
sce <- findMarkerDiffExp(sce, useAssay = "logcounts", method = "wilcox", cluster = celdaClusters(sce), minMeanExpr = 0, fdrThreshold = 0.05, log2fcThreshold = 0, minClustExprPerc = 0, maxCtrlExprPerc = 1)The function plotMarkerDiffExp can be used to plot the results in a heatmap. The topN parameter will plot the top N ranked genes for each cluster.
# Plot differentially expressed genes that pass additional thresholds 'minClustExprPerc' and 'maxCtrlExprPerc'
plotMarkerDiffExp(sce, topN = 5, log2fcThreshold = 0, rowLabel = TRUE, fdrThreshold = 0.05, minClustExprPerc = 0.6, maxCtrlExprPerc = 0.4, minMeanExpr = 0)
Other parameters such as minClustExprPerc (the minimum number of cells expressing the marker gene in the cluster) and maxCtrlExprPerc (the maximum number of cells expression the marker gene in other clusters) can be used to control how specific each marker gene is to each cell populations. Similarly, adding a log2 fold-change cutoff (e.g. 1) can select for markers that are more strongly up-regulated in a cell population.
The plotCeldaViolin function can be used to examine the distribution of expression of various features across cell population clusters derived from celda. Here we can see that the gene CD79A has high expression in the B-cell cluster and HMGB2 has high expression in the proliferating T-cell population.
# Normalize counts if not performed in previous steps
library(scater)
sce <- logNormCounts(sce, exprs_values = useAssay, name = "logcounts")
# Make violin plots for marker genes
plotCeldaViolin(sce, useAssay = "logcounts", features = c("CD79A", "HMGB2"))
The celda package comes with two functions for generating comprehensive HTML reports that 1) capture the process of selecting K/L for a celda_CG model and 2) plot the results from the downstream analysis. The first report runs both recursiveSplitModule and recursiveSplitCell for selection of L and K, respectively. To recapitulate the complete analysis presented in this tutorial in the HTML report, the following command can be used:
sce <- reportCeldaCGRun(sce, sampleLabel = NULL, useAssay = useAssay, altExpName = altExpName, minCell = 3, minCount = 3, initialL = 10, maxL = 150, initialK = 3, maxK = 25, L = 80, K = 14)All of the parameters in this function are the same that were used throughout this tutorial in the selectFeatures, recursiveSplitModule, and recursiveSplitCell functions. Note that this report does not do cell filtering, so that must be completed before running this function. The returned SCE object will have the celda_CG model with selected K and L which can be used in any of the downstream plotting functions as well as input into the second plotting report described next.
The second report takes in as input an SCE object with a fitted celda_CG model and systematically generates several plots that facilitate exploratory analysis including cell subpopulation cluster labels on 2-D embeddings, user-specified annotations on 2-D embeddings, module heatmaps, module probabilities, expression of marker genes on 2-D embeddings, and the celda probability map. The report can be generated with the following code:
reportCeldaCGPlotResults(sce, reducedDimName = "celda_UMAP", features = markers, useAssay = useAssay, altExpName = altExpName, cellAnnot = c("total", "detected", "decontX_contamination", "subsets_mito_percent"), cellAnnotLabel = "scDblFinder_class")User-supplied annotations to plot on the 2-D embedding can be specified through the cellAnnot and cellAnnotLabel variables. Both parameters will allow for plotting of variables stored in the colData of the SCE on the 2-D embedding plot specified by reducedDimName parameter. For cellAnnot, integer and numeric variables will be plotted as as continuous variables while factors and characters will be plotted as categorical variables. For cellAnnotLabel, all variables will be coerced to a factor and the labels of the categories will be plotted on the scatter plot.
The celda model factorizes the original matrix into three matrices:
1) module - The probability of each feature in each module (Psi)
2) cellPopulation - The probability of each module in each cell population (Phi)
3) sample - The probability of each cell population in each sample (Theta)
Additionally, we can calculate the probability of each module within each cell (cell). The cell matrix can essentially be used to replace PCs from PCA and is useful for downstream visualization (e.g. generating 2-D embeddings). All of these matrices can be retrieved with the factorizeMatrix function. The matrices are returned in three different versions: unnormalized counts, proportions (normalized by the total), or posterior estimates (where the Dirichlet concentration parameter is added in before normalization).
# Factorize the original counts matrix
fm <- factorizeMatrix(sce)
# Three different version of each matrix:
names(fm)## [1] "counts" "proportions" "posterior"
# Get normalized proportional matrices
dim(fm$proportions$cell) # Matrix of module probabilities for each cell## [1] 80 2675
dim(fm$proportions$module) # Matrix of feature probabilities for each module## [1] 2639 80
dim(fm$proportions$cellPopulation) # Matrix of module probabilities for each cell population## [1] 80 14
dim(fm$proportions$sample) # Matrix of cell population probabilities in each sample## [1] 14 1
The parameter displayName can be used to change the labels of the rows from the rownames to a column in the rowData of the SCE object. The function is available in plotDimReduceFeature and moduleHeatmap. For example, if we did not change the rownames to Symbol_TENx in the beginning of the tutorial, the following code still could be run in moduleHeatmap to display the gene symbol even if the rownames were set to the original Ensembl IDs:
moduleHeatmap(sce, featureModule = 27, useAssay = useAssay, altExpName = altExpName, displayName = "Symbol_TENx")
sessionInfo()
## R version 4.0.4 (2021-02-15)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 10.16
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] scater_1.18.6 kableExtra_1.3.4
## [3] knitr_1.31 ggplot2_3.3.5
## [5] celda_1.12.0 singleCellTK_2.5.1
## [7] TENxPBMCData_1.8.0 HDF5Array_1.18.1
## [9] rhdf5_2.34.0 DelayedArray_0.16.2
## [11] Matrix_1.3-2 SingleCellExperiment_1.12.0
## [13] SummarizedExperiment_1.20.0 Biobase_2.50.0
## [15] GenomicRanges_1.42.0 GenomeInfoDb_1.26.4
## [17] IRanges_2.24.1 S4Vectors_0.28.1
## [19] BiocGenerics_0.36.0 MatrixGenerics_1.2.1
## [21] matrixStats_0.58.0
##
## loaded via a namespace (and not attached):
## [1] utf8_1.2.1 reticulate_1.18
## [3] R.utils_2.10.1 tidyselect_1.1.0
## [5] RSQLite_2.2.4 AnnotationDbi_1.52.0
## [7] grid_4.0.4 combinat_0.0-8
## [9] BiocParallel_1.24.1 Rtsne_0.15
## [11] scDblFinder_1.4.0 DropletUtils_1.10.3
## [13] munsell_0.5.0 codetools_0.2-18
## [15] ragg_1.1.3 statmod_1.4.35
## [17] scran_1.18.5 xgboost_1.3.2.1
## [19] withr_2.4.1 colorspace_2.0-0
## [21] highr_0.8 rstudioapi_0.13
## [23] assertive.base_0.0-9 labeling_0.4.2
## [25] GenomeInfoDbData_1.2.4 GSVAdata_1.26.0
## [27] bit64_4.0.5 farver_2.1.0
## [29] rprojroot_2.0.2 vctrs_0.3.6
## [31] generics_0.1.0 xfun_0.22
## [33] BiocFileCache_1.14.0 fishpond_1.6.0
## [35] R6_2.5.0 doParallel_1.0.16
## [37] ggbeeswarm_0.6.0 clue_0.3-58
## [39] rsvd_1.0.3 RcppEigen_0.3.3.9.1
## [41] locfit_1.5-9.4 bitops_1.0-6
## [43] rhdf5filters_1.2.0 cachem_1.0.4
## [45] gridGraphics_0.5-1 assertthat_0.2.1
## [47] promises_1.2.0.1 scales_1.1.1
## [49] beeswarm_0.3.1 gtable_0.3.0
## [51] beachmat_2.6.4 Cairo_1.5-12.2
## [53] rlang_0.4.10 systemfonts_1.0.1
## [55] GlobalOptions_0.1.2 BiocManager_1.30.10
## [57] yaml_2.2.1 reshape2_1.4.4
## [59] httpuv_1.5.5 tools_4.0.4
## [61] ellipsis_0.3.1 jquerylib_0.1.3
## [63] RColorBrewer_1.1-2 Rcpp_1.0.6
## [65] plyr_1.8.6 sparseMatrixStats_1.2.1
## [67] zlibbioc_1.36.0 purrr_0.3.4
## [69] RCurl_1.98-1.2 dbscan_1.1-6
## [71] GetoptLong_1.0.5 viridis_0.5.1
## [73] cowplot_1.1.1 cluster_2.1.0
## [75] ggrepel_0.9.1 fs_1.5.0
## [77] magrittr_2.0.1 data.table_1.14.0
## [79] RSpectra_0.16-0 magick_2.7.0
## [81] circlize_0.4.12 mime_0.10
## [83] evaluate_0.14 xtable_1.8-4
## [85] gridExtra_2.3 shape_1.4.5
## [87] compiler_4.0.4 tibble_3.1.0
## [89] crayon_1.4.1 R.oo_1.24.0
## [91] htmltools_0.5.1.1 later_1.1.0.1
## [93] MCMCprecision_0.4.0 DBI_1.1.1
## [95] ExperimentHub_1.16.0 assertive.files_0.0-2
## [97] dbplyr_2.1.0 ComplexHeatmap_2.6.2
## [99] rappdirs_0.3.3 assertive.numbers_0.0-2
## [101] assertive.types_0.0-3 R.methodsS3_1.8.1
## [103] igraph_1.2.6 pkgconfig_2.0.3
## [105] pkgdown_1.6.1 scuttle_1.0.4
## [107] xml2_1.3.2 foreach_1.5.1
## [109] svglite_2.0.0 vipor_0.4.5
## [111] bslib_0.2.4 dqrng_0.2.1
## [113] webshot_0.5.2 XVector_0.30.0
## [115] rvest_1.0.0 stringr_1.4.0
## [117] digest_0.6.27 rmarkdown_2.7
## [119] enrichR_3.0 uwot_0.1.10
## [121] edgeR_3.32.1 DelayedMatrixStats_1.12.3
## [123] curl_4.3 shiny_1.6.0
## [125] gtools_3.8.2 rjson_0.2.20
## [127] lifecycle_1.0.0 jsonlite_1.7.2
## [129] Rhdf5lib_1.12.1 BiocNeighbors_1.8.2
## [131] desc_1.3.0 viridisLite_0.3.0
## [133] limma_3.46.0 fansi_0.4.2
## [135] pillar_1.5.1 lattice_0.20-41
## [137] fastmap_1.1.0 httr_1.4.2
## [139] interactiveDisplayBase_1.28.0 glue_1.4.2
## [141] FNN_1.1.3 png_0.1-7
## [143] iterators_1.0.13 multipanelfigure_2.1.2
## [145] bluster_1.0.0 BiocVersion_3.12.0
## [147] bit_4.0.4 assertive.properties_0.0-4
## [149] stringi_1.5.3 sass_0.3.1
## [151] blob_1.2.1 textshaping_0.3.5
## [153] BiocSingular_1.6.0 AnnotationHub_2.22.0
## [155] memoise_2.0.0 dplyr_1.0.5
## [157] irlba_2.3.3
vignettes/articles/decontX_pbmc4k.Rmd
decontX_pbmc4k.RmdDroplet-based microfluidic devices have become widely used to perform single-cell RNA sequencing (scRNA-seq). However, ambient RNA present in the cell suspension can be aberrantly counted along with a cell’s native mRNA and result in cross-contamination of transcripts between different cell populations. DecontX is a Bayesian method to estimate and remove contamination in individual cells. DecontX assumes the observed expression of a cell is a mixture of counts from two multinomial distributions: (1) a distribution of native transcript counts from the cell’s actual population and (2) a distribution of contaminating transcript counts from all other cell populations captured in the assay. Overall, computational decontamination of single cell counts can aid in downstream clustering and visualization.
The package can be loaded using the library command.
library(celda)DecontX can take either a SingleCellExperiment object or a counts matrix as input. decontX will attempt to convert any input matrix to class dgCMatrix from package Matrix before starting the analysis.
To import datasets directly into an SCE object, the singleCellTK package has several importing functions for different preprocessing tools including CellRanger, STARsolo, BUStools, Optimus, DropEST, SEQC, and Alevin/Salmon. For example, the following code can be used as a template to read in the filtered and raw matrices for multiple samples processed with CellRanger:
library(singleCellTK)
sce <- importCellRanger(sampleDirs = c("path/to/sample1/", "path/to/sample2/"))Within each sample directory, there should be subfolders called "outs/filtered_feature_bc_matrix/" or "outs/raw_feature_bc_matrix/" with files called matrix.mtx.gz, features.tsv.gz and barcodes.tsv.gz. If these files are in different subdirectories, the importCellRangerV3Sample function can be used to import data from a different directory instead.
Optionally, the “raw” or “droplet” matrix can also be easily imported by setting the dataType argument to “raw”:
sce.raw <- importCellRanger(sampleDirs = c("path/to/sample1/", "path/to/sample2/"), dataType = "raw")The raw matrix can be passed to the background parameter in decontX as described below. If using Seurat, go to the Working with Seurat section for details on how to convert between SCE and Seurat objects.
We will utilize the 10X PBMC 4K dataset as an example in this vignette. This data can be easily retrieved from the package TENxPBMCData. Make sure the the column names are set before running decontX.
A SingleCellExperiment (SCE) object or a sparse matrix containing the counts for filtered cells can be passed to decontX via the x parameter. The matrix to use in an SCE object can be specified with the assayName parameter, which is set to "counts" by default. There are two major ways to run decontX: with and without the raw/droplet matrix containing empty droplets. Here is an example of running decontX without supplying the background:
sce <- decontX(sce)In this scenario, decontX will estimate the contamination distribution for each cell cluster based on the profiles of the other cell clusters in the filtered dataset. The estimated contamination results can be found in the colData(sce)$decontX_contamination and the decontaminated counts can be accessed with decontXcounts(sce). decontX will perform heuristic clustering to quickly define major cell clusters. However if you have your own cell cluster labels, they can be specified with the z parameter. These results will be used throughout the rest of the vignette.
The raw/droplet matrix can be used to empirically estimate the distribution of ambient RNA, which is especially useful when cells that contributed to the ambient RNA are not accurately represented in the filtered count matrix containing the cells. For example, cells that were removed via flow cytometry or that were more sensitive to lysis during dissociation may have contributed to the ambient RNA but were not measured in the filtered/cell matrix. The raw/droplet matrix can be input as an SCE object or a sparse matrix using the background parameter:
sce <- decontX(sce, background = sce.raw)Only empty droplets in the background matrix should be used to estimate the ambient RNA. If any cell ids (i.e. colnames) in the raw/droplet matrix supplied to the background parameter are also found in the filtered counts matrix (x), decontX will automatically remove them from the raw matrix. However, if the cell ids are not available for the input matrices, decontX will treat the entire background input as empty droplets. All of the outputs are the same as when running decontX without setting the background parameter.
Note: If the input object is just a matrix and not an SCE object, make sure to save the output into a variable with a different name (e.g.
result <- decontX(mat)). The result object will be a list with contamination inresult$contaminationand the decontaminated counts inresult$decontXcounts.
DecontX creates a UMAP which we can use to plot the cluster labels automatically identified in the analysis. Note that the clustering approach used here is designed to find “broad” cell types rather than individual cell subpopulations within a cell type.
umap <- reducedDim(sce, "decontX_UMAP")
plotDimReduceCluster(x = sce$decontX_clusters,
dim1 = umap[, 1], dim2 = umap[, 2])
The percentage of contamination in each cell can be plotting on the UMAP to visualize what what clusters may have higher levels of ambient RNA.

Known marker genes can also be plotted on the UMAP to identify the cell types for each cluster. We will use CD3D and CD3E for T-cells, LYZ, S100A8, and S100A9 for monocytes, CD79A, CD79B, and MS4A1 for B-cells, GNLY for NK-cells, and PPBP for megakaryocytes.
library(scater)
sce <- logNormCounts(sce)
plotDimReduceFeature(as.matrix(logcounts(sce)),
dim1 = umap[, 1],
dim2 = umap[, 2],
features = c("CD3D", "CD3E", "GNLY",
"LYZ", "S100A8", "S100A9",
"CD79A", "CD79B", "MS4A1"),
exactMatch = TRUE)
The percetage of cells within a cluster that have detectable expression of marker genes can be displayed in a barplot. Markers for cell types need to be supplied in a named list. First, the detection of marker genes in the original counts assay is shown:
markers <- list(Tcell_Markers = c("CD3E", "CD3D"),
Bcell_Markers = c("CD79A", "CD79B", "MS4A1"),
Monocyte_Markers = c("S100A8", "S100A9", "LYZ"),
NKcell_Markers = "GNLY")
cellTypeMappings <- list(Tcells = 2, Bcells = 5, Monocytes = 1, NKcells = 6)
plotDecontXMarkerPercentage(sce,
markers = markers,
groupClusters = cellTypeMappings,
assayName = "counts")
We can then look to see how much decontX removed aberrant expression of marker genes in each cell type by changing the assayName to decontXcounts:
plotDecontXMarkerPercentage(sce,
markers = markers,
groupClusters = cellTypeMappings,
assayName = "decontXcounts")
Percentages of marker genes detected in other cell types were reduced or completely removed. For example, the percentage of cells that expressed Monocyte marker genes was greatly reduced in T-cells, B-cells, and NK-cells. The original counts and decontamined counts can be plotted side-by-side by listing multiple assays in the assayName parameter. This option is only available if the data is stored in SingleCellExperiment object.
plotDecontXMarkerPercentage(sce,
markers = markers,
groupClusters = cellTypeMappings,
assayName = c("counts", "decontXcounts"))
Some helpful hints when using plotDecontXMarkerPercentage:
groupCluster parameter, which also needs to be a named list. If groupCluster is used, cell clusters not included in the list will be excluded in the barplot. For example, if we wanted to group T-cells and NK-cells together, we could set cellTypeMappings <- list(NK_Tcells = c(2,6), Bcells = 5, Monocytes = 1)
threshold parameter.SingleCellExperiment, then you will need to supply the original counts matrix or the decontaminated counts matrix as the first argument to generate the barplots.Another useful way to assess the amount of decontamination is to view the expression of marker genes before and after decontX across cell types. Here we view the monocyte markers in each cell type. The violin plot shows that the markers have been removed from T-cells, B-cells, and NK-cells, but are largely unaffected in monocytes.
plotDecontXMarkerExpression(sce,
markers = markers[["Monocyte_Markers"]],
groupClusters = cellTypeMappings,
ncol = 3)
Some helpful hints when using plotDecontXMarkerExpression:
groupClusters works the same way as in plotDecontXMarkerPercentage.groupClusters). Therefore, you may want to keep the number of markers small in each plot and call the function multiple times for different sets of marker genes.plotDots = TRUE and/or log transform the points on the fly by setting log1p = TRUE.SingleCellExperiment. Therefore you could also examine normalized expression of the original and decontaminated counts. For example:
library(scater)
sce <- logNormCounts(sce,
exprs_values = "decontXcounts",
name = "decontXlogcounts")
plotDecontXMarkerExpression(sce,
markers = markers[["Monocyte_Markers"]],
groupClusters = cellTypeMappings,
ncol = 3,
assayName = c("logcounts", "decontXlogcounts"))
The ability of DecontX to accurately identify contamination is dependent on the cell cluster labels. DecontX assumes that contamination for a cell cluster comes from combination of counts from all other clusters. The default clustering approach used by DecontX tends to select fewer clusters that represent broader cell types. For example, all T-cells tend to be clustered together rather than splitting naive and cytotoxic T-cells into separate clusters. Custom cell type labels can be suppled via the z parameter if some cells are not being clustered appropriately by the default method.
There are ways to force decontX to estimate more or less contamination across a dataset by manipulating the priors. The delta parameter is a numeric vector of length two. It is the concentration parameter for the Dirichlet distribution which serves as the prior for the proportions of native and contamination counts in each cell. The first element is the prior for the proportion of native counts while the second element is the prior for the proportion of contamination counts. These essentially act as pseudocounts for the native and contamination in each cell. If estimateDelta = TRUE, delta is only used to produce a random sample of proportions for an initial value of contamination in each cell. Then delta is updated in each iteration. If estimateDelta = FALSE, then delta is fixed with these values for the entire inference procedure. Fixing delta and setting a high number in the second element will force decontX to be more aggressive and estimate higher levels of contamination in each cell at the expense of potentially removing native expression. For example, in the previous PBMC example, we can see what the estimated delta was by looking in the estimates:
metadata(sce)$decontX$estimates$all_cells$delta## [1] 9.287164 1.038217
Setting a higher value in the second element of delta and estimateDelta = FALSE will force decontX to estimate higher levels of contamination per cell:
sce.delta <- decontX(sce, delta = c(9, 20), estimateDelta = FALSE)
plot(sce$decontX_contamination, sce.delta$decontX_contamination,
xlab = "DecontX estimated priors",
ylab = "Setting priors to estimate higher contamination")
abline(0, 1, col = "red", lwd = 2)
If you are using the Seurat package for downstream analysis, the following code can be used to read in a matrix and convert between Seurat and SCE objects:
# Read counts from CellRanger output
library(Seurat)
counts <- Read10X("sample/outs/filtered_feature_bc_matrix/")
# Create a SingleCellExperiment object and run decontX
sce <- SingleCellExperiment(list(counts = counts))
sce <- decontX(sce)
# Create a Seurat object from a SCE with decontX results
seuratObject <- CreateSeuratObject(round(decontXcounts(sce)))Optionally, the “raw” matrix can be also be imported and used as the background:
counts.raw <- Read10X("sample/outs/raw_feature_bc_matrix/")
sce.raw <- SingleCellExperiment(list(counts = counts.raw))
sce <- decontX(sce, background = sce.raw)Note that the decontaminated matrix of decontX consists of floating point numbers and must be rounded to integers before adding it to a Seurat object. If you already have a Seurat object containing the counts matrix and would like to run decontX, you can retrieve the count matrix, create a SCE object, and run decontX, and then add it back to the Seurat object:
counts <- GetAssayData(object = seuratObject, slot = "counts")
sce <- SingleCellExperiment(list(counts = counts))
sce <- decontX(sce)
seuratObj[["decontXcounts"]] <- CreateAssayObject(counts = decontXcounts(sce))## R version 4.0.4 (2021-02-15)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 10.16
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] scater_1.18.6 ggplot2_3.3.5
## [3] TENxPBMCData_1.8.0 HDF5Array_1.18.1
## [5] rhdf5_2.34.0 DelayedArray_0.16.2
## [7] celda_1.12.0 Matrix_1.3-2
## [9] SingleCellExperiment_1.12.0 SummarizedExperiment_1.20.0
## [11] Biobase_2.50.0 GenomicRanges_1.42.0
## [13] GenomeInfoDb_1.26.4 IRanges_2.24.1
## [15] S4Vectors_0.28.1 BiocGenerics_0.36.0
## [17] MatrixGenerics_1.2.1 matrixStats_0.58.0
## [19] BiocStyle_2.18.1
##
## loaded via a namespace (and not attached):
## [1] AnnotationHub_2.22.0 BiocFileCache_1.14.0
## [3] systemfonts_1.0.1 RcppEigen_0.3.3.9.1
## [5] plyr_1.8.6 assertive.files_0.0-2
## [7] enrichR_3.0 multipanelfigure_2.1.2
## [9] BiocParallel_1.24.1 digest_0.6.27
## [11] foreach_1.5.1 htmltools_0.5.1.1
## [13] viridis_0.5.1 magick_2.7.0
## [15] fansi_0.4.2 magrittr_2.0.1
## [17] memoise_2.0.0 assertive.numbers_0.0-2
## [19] doParallel_1.0.16 pkgdown_1.6.1
## [21] colorspace_2.0-0 blob_1.2.1
## [23] rappdirs_0.3.3 ggrepel_0.9.1
## [25] textshaping_0.3.5 xfun_0.22
## [27] dplyr_1.0.5 crayon_1.4.1
## [29] RCurl_1.98-1.2 jsonlite_1.7.2
## [31] iterators_1.0.13 glue_1.4.2
## [33] gtable_0.3.0 zlibbioc_1.36.0
## [35] XVector_0.30.0 BiocSingular_1.6.0
## [37] Rhdf5lib_1.12.1 scales_1.1.1
## [39] DBI_1.1.1 Rcpp_1.0.6
## [41] viridisLite_0.3.0 xtable_1.8-4
## [43] gridGraphics_0.5-1 bit_4.0.4
## [45] rsvd_1.0.3 httr_1.4.2
## [47] RColorBrewer_1.1-2 ellipsis_0.3.1
## [49] farver_2.1.0 pkgconfig_2.0.3
## [51] scuttle_1.0.4 sass_0.3.1
## [53] uwot_0.1.10 dbplyr_2.1.0
## [55] utf8_1.2.1 labeling_0.4.2
## [57] tidyselect_1.1.0 rlang_0.4.10
## [59] reshape2_1.4.4 later_1.1.0.1
## [61] AnnotationDbi_1.52.0 munsell_0.5.0
## [63] BiocVersion_3.12.0 tools_4.0.4
## [65] cachem_1.0.4 dbscan_1.1-6
## [67] generics_0.1.0 RSQLite_2.2.4
## [69] ExperimentHub_1.16.0 evaluate_0.14
## [71] stringr_1.4.0 fastmap_1.1.0
## [73] yaml_2.2.1 ragg_1.1.3
## [75] knitr_1.31 bit64_4.0.5
## [77] fs_1.5.0 purrr_0.3.4
## [79] sparseMatrixStats_1.2.1 mime_0.10
## [81] compiler_4.0.4 beeswarm_0.3.1
## [83] curl_4.3 interactiveDisplayBase_1.28.0
## [85] tibble_3.1.0 bslib_0.2.4
## [87] stringi_1.5.3 highr_0.8
## [89] RSpectra_0.16-0 desc_1.3.0
## [91] lattice_0.20-41 assertive.base_0.0-9
## [93] vctrs_0.3.6 pillar_1.5.1
## [95] lifecycle_1.0.0 rhdf5filters_1.2.0
## [97] BiocManager_1.30.10 combinat_0.0-8
## [99] jquerylib_0.1.3 RcppAnnoy_0.0.18
## [101] BiocNeighbors_1.8.2 data.table_1.14.0
## [103] bitops_1.0-6 irlba_2.3.3
## [105] httpuv_1.5.5 assertive.types_0.0-3
## [107] R6_2.5.0 bookdown_0.21
## [109] assertive.properties_0.0-4 promises_1.2.0.1
## [111] gridExtra_2.3 vipor_0.4.5
## [113] codetools_0.2-18 MCMCprecision_0.4.0
## [115] assertthat_0.2.1 rprojroot_2.0.2
## [117] rjson_0.2.20 withr_2.4.1
## [119] GenomeInfoDbData_1.2.4 grid_4.0.4
## [121] beachmat_2.6.4 rmarkdown_2.7
## [123] DelayedMatrixStats_1.12.3 Rtsne_0.15
## [125] shiny_1.6.0 ggbeeswarm_0.6.0
vignettes/articles/installation.Rmd
installation.Rmd“celda” stands for “CEllular Latent Dirichlet Allocation”. It is a suite of Bayesian hierarchical models and supporting functions to perform gene and cell clustering for count data generated by single cell RNA-seq platforms. This algorithm is an extension of the Latent Dirichlet Allocation (LDA) topic modeling framework that has been popular in text mining applications. This package also includes a method called decontX which can be used to estimate and remove contamination in single cell genomic data.
To install the latest stable release of celda from Bioconductor (requires R version >= 3.6):
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("celda")The latest stable version of celda can be installed from GitHub using devtools:
library(devtools)
install_github("campbio/celda")The development version of celda can also be installed from GitHub using devtools:
library(devtools)
install_github("campbio/celda@devel")NOTE For MAC OSX users, devtools::install_github() requires installation of libgit2. This can be installed via homebrew:
brew install libgit2
'wchar.h' file not found, you can try the method in this link:could not find tools necessary to compile a package, you can try typing this before running the install command:options(buildtools.check = function(action) TRUE)
vignettes/celda.Rmd
celda.RmdCEllular Latent Dirichlet Allocation (celda) is a collection of Bayesian hierarchical models to perform feature and cell bi-clustering for count data generated by single-cell platforms. This algorithm is an extension of the Latent Dirichlet Allocation (LDA) topic modeling framework that has been popular in text mining applications and has shown good performance with sparse data. celda simultaneously clusters features (i.e. gene expression) into modules based on co-expression patterns across cells and cells into subpopulations based on the probabilities of the feature modules within each cell.
Starting from Bioconductor release 3.12 (celda version
1.6.0), celda makes use of SingleCellExperiment
(SCE) objects for storing data and results. In this vignette we will
demonstrate how to use celda to perform cell and feature clustering with
a simple, small simulated dataset. This vignette does not include
upstream importing of data, quality control, or filtering. To see a more
complete analysis of larger real-world datasets, visit camplab.net/celda for
additional vignettes.
celda can be installed from Bioconductor:
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("celda")To load the package, type the following:
library(celda)A complete list of help files are accessible using the help command
with the package option.
help(package = celda)To see the latest updates and releases or to post a bug, see our GitHub page at https://github.com/campbio/celda. To ask questions about running celda, post a thread on Bioconductor support site at https://support.bioconductor.org/.
celda will take a matrix of counts where each row is a feature, each column is a cell, and each entry in the matrix is the number of counts of each feature in each cell. To illustrate the utility of celda, we will apply it to a simulated dataset.
In the function simulateCells, the K
parameter designates the number of cell clusters, the L
parameter determines the number of feature modules, the
S parameter determines the number of samples in the
simulated dataset, the G parameter determines the
number of features to be simulated, and CRange
specifies the lower and upper bounds of the number of cells to be
generated in each sample.
To simulate a dataset of 5 samples with 5 cell populations, 10
feature modules, 200 features, and between 30 to 50 cells per sample
using celda_CG model:
simsce <- simulateCells("celda_CG",
S = 5, K = 5, L = 10, G = 200, CRange = c(30, 50))The counts assay slot in simsce contains
the counts matrix. The dimensions of counts matrix:
## [1] 200 207
Columns celda_sample_label and
celda_cell_cluster in colData(simsce) contain
sample labels and celda cell population cluster labels. Here are the
numbers of cells in each subpopulation and in each sample:
##
## 1 2 3 4 5
## 42 44 40 47 34
##
## Sample_1 Sample_2 Sample_3 Sample_4 Sample_5
## 43 48 45 40 31
Column celda_feature_module in
rowData(simsce) contains feature module labels. Here is the
number of features in each feature module:
##
## 1 2 3 4 5 6 7 8 9 10
## 23 39 17 15 21 22 19 12 4 28
A simple heuristic feature selection is performed to reduce the size
of features used for clustering. To speed up the process, only features
with at least 3 counts in at least 3 cells are included in downstream
clustering for this data. A subset SingleCellExperiment
object with filtered features is stored in
altExp(simsce, "featureSubset") slot by default.
simsce <- selectFeatures(simsce)If the number of features is still too large, then a smaller subset of features can be obtained by selecting the top number of most variable genes. For an example code, see the PBMC3K tutorial in the online celda documentation.
There are currently three models within celda package:
celda_C will cluster cells, celda_G will
cluster features, and celda_CG will simultaneously cluster
cells and features. Within the functions the K parameter
will be the number of cell populations to be estimated, while the
L parameter will be the number of feature modules to be
estimated in the output model.
sce <- celda_CG(x = simsce, K = 5, L = 10, verbose = FALSE, nchains = 1)Here is a comparison between the true cluster labels and the estimated cluster labels.
table(celdaClusters(sce), celdaClusters(simsce))##
## 1 2 3 4 5
## 1 0 44 0 0 0
## 2 42 0 0 0 0
## 3 0 0 40 0 0
## 4 0 0 0 47 0
## 5 0 0 0 0 34
table(celdaModules(sce), celdaModules(simsce))##
## 1 2 3 4 5 6 7 8 9 10
## 1 0 32 0 0 0 0 0 0 0 0
## 2 19 0 0 0 0 0 0 0 0 0
## 3 0 0 15 0 0 0 0 0 0 0
## 4 0 0 0 13 0 0 0 0 0 0
## 5 0 0 0 0 21 0 0 0 0 0
## 6 0 0 0 0 0 19 0 0 0 0
## 7 0 0 0 0 0 0 0 12 0 0
## 8 0 0 0 0 0 0 17 0 0 0
## 9 0 0 0 0 0 0 0 0 3 0
## 10 0 0 0 0 0 0 0 0 0 20
celda contains its own wrapper function for tSNE and UMAP called
celdaTsne and celdaUmap, respectively. Both of
these functions can be used to embed cells into 2-dimensions. The output
can be used in the downstream plotting functions
plotDimReduceCluster, plotDimReduceModule, and
plotDimReduceFeature to show cell population clusters,
module probabilities, and expression of individual features,
respectively.
sce <- celdaUmap(sce)
plotDimReduceCluster(x = sce, reducedDimName = "celda_UMAP")
plotDimReduceModule(x = sce, reducedDimName = "celda_UMAP", rescale = TRUE)
plotDimReduceFeature(x = sce, reducedDimName = "celda_UMAP",
normalize = TRUE, features = "Gene_1")
The clustering results can be viewed with a heatmap of the normalized
counts using the function celdaHeatmap. The top
nfeatures in each module will be selected according to the
factorized module probability matrix.
plot(celdaHeatmap(sce = sce, nfeatures = 10))
The relationships between feature modules and cell populations can be
visualized with celdaProbabilityMap. The absolute
probabilities of each feature module in each cellular subpopulation is
shown on the left. The normalized and z-scored expression of each module
in each cell population is shown on the right.
celdaProbabilityMap(sce)
moduleHeatmap creates a heatmap using only the features
from a specific feature module. Cells are ordered from those with the
lowest probability of the module to the highest. If more than one module
is used, then cells will be ordered by the probabilities of the first
module.
moduleHeatmap(sce, featureModule = c(1,2), topCells = 100)
In the previous example, the best K (the number of cell
clusters) and L (the number of feature modules) was already
known. However, the optimal K and L for each
new dataset will likely not be known beforehand and multiple choices of
K and L may need to be tried and compared.
celda offers two sets of functions to determine the optimum
K and L,
recursiveSplitModule/recursiveSplitCell, and
celdaGridSearch.
Functions recursiveSplitModule and
recursiveSplitCell offer a fast method to generate a celda
model with optimum K and L. First,
recursiveSplitModule is used to determine the optimal
L. recursiveSplitModule first splits features
into however many modules are specified in initialL. The
module labels are then recursively split in a way that would generate
the highest log-likelihood, all the way up to maxL.
moduleSplit <- recursiveSplitModule(simsce, initialL = 2, maxL = 15)Perplexity is a statistical measure of how well a probability model
can predict new data. Lower perplexity indicates a better model. The
perplexity of each model can be visualized with
plotGridSearchPerplexity. In general, visual inspection of
the plot can be used to select the optimal number of modules
(L) or cell populations (K) by identifying the
“elbow” - where the rate of decrease in the perplexity starts to drop
off.
plotGridSearchPerplexity(moduleSplit)
In this example, the perplexity for L stops decreasing
at L = 10, thus L = 10 would be a good choice. Sometimes the perplexity
alone does not show a clear elbow or “leveling off”. However, the rate
of perplexity change (RPC) can be more informative to determine when
adding new modules does not add much additional information Zhao
et al., 2015). An RPC closer to zero indicates that the addition of
new modules or cell clusters is not substantially decreasing the
perplexity. The RPC of models can be visualized using function
plotRPC:
plotRPC(moduleSplit)
Once you have identified the optimal L (in this case, L
is selected to be 10), the module labels are used for initialization in
recursiveSplitCell. Similarly to
recursiveSplitModule, cells are initially split into a
small number of subpopulations, and the subpopulations are recursively
split up.
moduleSplitSelect <- subsetCeldaList(moduleSplit, params = list(L = 10))
cellSplit <- recursiveSplitCell(moduleSplitSelect,
initialK = 3,
maxK = 12,
yInit = celdaModules(moduleSplitSelect))
plotGridSearchPerplexity(cellSplit)
plotRPC(cellSplit)
In this plot, the perplexity for K stops decreasing at K = 5, with a
final K/L combination of K = 5, L = 10. Generally, this method can be
used to pick a reasonable L and a potential range of
K. However, manual review of specific selections of
K is often required to ensure results are biologically
coherent.
Once users have chosen the K/L parameters for further analysis, the
subsetCeldaList function can be used to subset the celda
list SCE object to a single model SCE object.
sce <- subsetCeldaList(cellSplit, params = list(K = 5, L = 10))Alternativley to recursive splitting, celda is able to run multiple
combinations of K and L with multiple chains in parallel via the
celdaGridSearch function.
cgs <- celdaGridSearch(simsce,
paramsTest = list(K = seq(4, 6), L = seq(9, 11)),
cores = 1,
model = "celda_CG",
nchains = 2,
maxIter = 100,
verbose = FALSE,
bestOnly = TRUE)Setting verbose to TRUE will print the
output of each model to a text file. These results can be visualized
with plotGridSearchPerplexity. The major goal is to pick
the lowest K and L combination with relatively
good perplexity. In general, visual inspection of the plot can be used
to select the number of modules (L) or cell populations
(K) where the rate of decrease in the perplexity starts to
drop off. bestOnly = TRUE indicates that only the chain
with the best log likelihood will be returned for each K/L
combination.

In this example, the perplexity for L stops decreasing
at L = 10 for the majority of K values. For the line
corresponding to L = 10, the perplexity stops decreasing at K = 5. Thus
L = 10 and K = 5 would be a good choice. Again, manual review of
specific selections of K is often be required to ensure results are
biologically coherent.
Once users have chosen the K/L parameters for further analysis, the
subsetCeldaList function can be used to subset the celda
list SCE object to a single model SCE object.
sce <- subsetCeldaList(cgs, params = list(K = 5, L = 10))If the “bestOnly” parameter is set to FALSE in the
celdaGridSearch, then the selectBestModel
function can be used to select the chains with the lowest log
likelihoods within each combination of parameters. Alternatively, users
can select a specific chain by specifying the index within the
subsetCeldaList function.
cgs <- celdaGridSearch(simsce,
paramsTest = list(K = seq(4, 6), L = seq(9, 11)),
cores = 1,
model = "celda_CG",
nchains = 2,
maxIter = 100,
verbose = FALSE,
bestOnly = FALSE)
cgs <- resamplePerplexity(cgs, celdaList = cgs, resample = 2)
cgsK5L10 <- subsetCeldaList(cgs, params = list(K = 5, L = 10))
sce <- selectBestModel(cgsK5L10)celda also contains several utility functions for the users’ convenience.
featureModuleLookup can be used to look up the module a
specific feature was clustered to.
featureModuleLookup(sce, feature = c("Gene_99"))## Gene_99
## 4
recodeClusterZ and recodeClusterY allows
the user to recode the cell and feature cluster labels,
respectively.
sceZRecoded <- recodeClusterZ(sce,
from = c(1, 2, 3, 4, 5), to = c(2, 1, 3, 4, 5))The model prior to reordering cell labels compared to after reordering cell labels:
table(celdaClusters(sce), celdaClusters(sceZRecoded))##
## 1 2 3 4 5
## 1 0 44 0 0 0
## 2 42 0 0 0 0
## 3 0 0 40 0 0
## 4 0 0 0 47 0
## 5 0 0 0 0 34
## R version 4.3.3 (2024-02-29)
## Platform: x86_64-apple-darwin20 (64-bit)
## Running under: macOS Sonoma 14.4.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] celda_1.18.2 Matrix_1.6-5
## [3] SingleCellExperiment_1.24.0 SummarizedExperiment_1.32.0
## [5] Biobase_2.62.0 GenomicRanges_1.54.1
## [7] GenomeInfoDb_1.38.8 IRanges_2.36.0
## [9] S4Vectors_0.40.2 BiocGenerics_0.48.1
## [11] MatrixGenerics_1.14.0 matrixStats_1.2.0
## [13] BiocStyle_2.30.0
##
## loaded via a namespace (and not attached):
## [1] bitops_1.0-7 gridExtra_2.3 rlang_1.1.3
## [4] magrittr_2.0.3 clue_0.3-65 GetoptLong_1.0.5
## [7] compiler_4.3.3 png_0.1-8 systemfonts_1.0.6
## [10] vctrs_0.6.5 reshape2_1.4.4 combinat_0.0-8
## [13] stringr_1.5.1 shape_1.4.6.1 pkgconfig_2.0.3
## [16] crayon_1.5.2 fastmap_1.1.1 magick_2.8.2
## [19] XVector_0.42.0 labeling_0.4.3 utf8_1.2.4
## [22] rmarkdown_2.25 ragg_1.3.0 purrr_1.0.2
## [25] xfun_0.41 WriteXLS_6.5.0 zlibbioc_1.48.2
## [28] cachem_1.0.8 jsonlite_1.8.8 highr_0.10
## [31] DelayedArray_0.28.0 cluster_2.1.6 irlba_2.3.5.1
## [34] parallel_4.3.3 R6_2.5.1 bslib_0.6.1
## [37] stringi_1.8.3 RColorBrewer_1.1-3 MCMCprecision_0.4.0
## [40] jquerylib_0.1.4 Rcpp_1.0.12 bookdown_0.37
## [43] iterators_1.0.14 knitr_1.45 FNN_1.1.4
## [46] tidyselect_1.2.0 rstudioapi_0.15.0 abind_1.4-5
## [49] yaml_2.3.8 enrichR_3.2 doParallel_1.0.17
## [52] codetools_0.2-19 curl_5.2.1 lattice_0.22-5
## [55] tibble_3.2.1 plyr_1.8.9 withr_3.0.0
## [58] evaluate_0.23 Rtsne_0.17 desc_1.4.3
## [61] circlize_0.4.16 RcppEigen_0.3.4.0.0 pillar_1.9.0
## [64] BiocManager_1.30.22 foreach_1.5.2 generics_0.1.3
## [67] RCurl_1.98-1.14 ggplot2_3.5.0 munsell_0.5.1
## [70] scales_1.3.0 glue_1.7.0 tools_4.3.3
## [73] data.table_1.15.4 fs_1.6.3 Cairo_1.6-2
## [76] grid_4.3.3 colorspace_2.1-0 GenomeInfoDbData_1.2.11
## [79] cli_3.6.2 textshaping_0.3.7 fansi_1.0.6
## [82] S4Arrays_1.2.1 ComplexHeatmap_2.18.0 dplyr_1.1.4
## [85] uwot_0.1.16 gtable_0.3.4 sass_0.4.8
## [88] digest_0.6.35 SparseArray_1.2.4 ggrepel_0.9.5
## [91] rjson_0.2.21 farver_2.1.1 memoise_2.0.1
## [94] htmltools_0.5.7 pkgdown_2.0.7 lifecycle_1.0.4
## [97] httr_1.4.7 GlobalOptions_0.1.2
vignettes/articles/celda_pbmc3k.Rmd
celda_pbmc3k.RmdCelda is a Bayesian hierarchical model that can perform bi-clustering of features into modules and observations into subpopulations. In this tutorial, we will apply Celda to a real-world single-cell RNA sequencing (scRNA-seq) dataset of 2,700 Peripheral Blood Mononuclear Cells (PBMCs) collected from a healthy donor. This dataset (PBMC3K) is available from 10X Genomics and can be found on the 10X website.
The celda package uses the SingleCellExperiment (SCE) object for management of
expression matrices, feature/cell annotation data, and metadata. All of
the functions have an SCE object as the first input parameter. The
functions operate on a matrix stored in the assay slot of
the SCE object. The parameter useAssay can be used to
specify which matrix to use (the default is "counts").
Matrices can be of class matrix or dgCMatrix
from the Matrix
package. While the primary clustering is performed with functions from
the celda package, the singleCellTK
package is used for some other tasks such as importing data, quality
control, and marker identification with differential expression.
The PBMC3K data can be easily loaded via the Bioconductor package TENxPBMCData. TENxPBMCData is an
experiment package that provides resources for various PBMC datasets
generated by 10X Genomics. When using this package, the column names of
returned SCE object are NULL by default. For this example,
we paste together the name of the sample with the cell barcode to
generate column names for the SCE object. Additionally, the count matrix
within sce object is converted from a
DelayedMatrix object to a sparse matrix
dgCMatrix object.
library(TENxPBMCData)
sce <- TENxPBMCData("pbmc3k")
colnames(sce) <- paste0("pbmc3k_", colData(sce)$Sequence)
counts(sce) <- as(counts(sce), "dgCMatrix")If you have the singleCellTK package installed, then this dataset can be imported and converted with a single command:
library(singleCellTK)
sce <- importExampleData("pbmc3k")To get your own data into a SingleCellExperiment object,
the singleCellTK package has several importing functions
for different preprocessing tools including CellRanger, STARsolo,
BUStools, Optimus, DropEST, SEQC, and Alevin/Salmon. For example, the
following code can be used as a template to read in multiple samples
processed with CellRanger:
library(singleCellTK)
sce <- importCellRanger(sampleDirs = c("path/to/sample1/", "path/to/sample2/"))Note: As a reminder, you can view the assays, column
annotation, and row annotation stored in the SCE with the commands
assays(sce), colData(sce), and
rowData(sce), respectively.
Finally, we set the rownames of the SCE to the gene
symbol:
Quality control and filtering of cells is often needed before down-stream analyses such as dimensionality reduction and clustering. Typical filtering procedures include exclusion of poor quality cells with low numbers of counts/UMIs, estimation and removal of ambient RNA, and identification of potential doublet/multiplets. Many tools and packages are available to perform these operations and users are free to apply their tool(s) of choice as the celda clustering functions will work with any matrix stored in an SCE object. The celda package does contain a Bayesian method called decontX to estimate and remove transcript contamination in individual cells in a scRNA-seq dataset.
To perform QC, we suggest using the runCellQC function
in singleCellTK package. This is a wrapper for several
methods for calculation of QC metrics, doublet detection, and estimation
of ambient RNA (including decontX). Below is a quick example of how to
perform standard QC before applying celda. If you have another preferred
approach or your data has already been QC’ed, you can move to Feature selection section. For this
tutorial, we will only run one doublet detection algorithm and one
decontamination algorithms. For a full list of algorithms that this
function runs by default, see ?runCellQC. We will also
quantify the percentage of mitochondrial genes in each cell as this is
often used as a measure of cell viability.
library(singleCellTK)
# Get list of mitochondrial genes
mito.genes <- grep("^MT-", rownames(sce), value = TRUE)
# Run QC
sce <- runCellQC(sce, sample = NULL, algorithms = c("QCMetrics", "scDblFinder", "decontX"), geneSetList = list(mito=mito.genes), geneSetListLocation = "rownames")Note: If you have cells from multiple samples stored
in the SCE object, make sure to supply the sample parameter
as the QC tools need to be applied to cells from each sample
individually.
Individual sets of QC metrics can be plotted with specific functions.
For example to plot distributions of total numbers of UMIs derived from
runPerCellQC, doublet scores from
runScDblFinder, and contamination scores from
runDecontX (all of which were run by the
runCellQC function), the following plotting functions can
be used:

plotScDblFinderResults(sce, reducedDimName = "decontX_UMAP")
plotDecontXResults(sce, reducedDimName = "decontX_UMAP")
An comprehensive HTML report can be generated to visualize and explore the QC metrics in greater detail:
reportCellQC(sce)After examining the distributions of various QC metrics, poor quality
cells will need to be removed. Typically, thresholds for QC metrics
should exclude cells that are outliers of the distribution (i.e. long
tails in the violin or density plots). Cells can be removed using the
subsetSCECols function. Metrics stored in the
colData of the SCE object can be filtered using the
colData parameter. Here we will limit to cells with at
least 600 counts and 300 genes detected:
# Filter SCE
sce <- subsetSCECols(sce, colData = c("total > 600", "detected > 300"))
# See number of cells after filtering
ncol(sce)## [1] 2675
Other common metrics to filter on include
subsets_mito_percent for removal of cells with high
mitochondrial percentage, decontX_contamination for removal
of cells with higher levels of contamination from ambient RNA,
scDblFinder_class to remove doublets (or calls from any of
the other doublet detection algorithms). See the
singleCellTK documentation
For more information on performing comprehensive QC and filtering.
In general, removing features with low numbers of counts across all
cells is recommended to reduce computational run time. A simple
selection can be performed by removing features with a minimum number of
counts in a minimum number of cells using the
selectFeatures function:
# Select features with at least 3 counts in at least 3 cells
library(celda)
useAssay <- "counts"
altExpName <- "featureSubset"
sce <- selectFeatures(sce, minCount = 3, minCell = 3, useAssay = useAssay, altExpName = altExpName)
# See number of features after filtering
nrow(altExp(sce, altExpName))## [1] 2639
The useAssay parameter is used to denote which
assay/matrix within the SCE to use for filtering. The default raw counts
matrix is traditionally stored in the "counts" assay. If
decontX was previously run during QC, then the
decontaminated counts can be used by setting this parameter to
"decontXcounts". We will save this parameter in a variable
called useAssay which will be used as input in several
downstream functions.
Note: The subsetted matrix is stored in the
“alternative experiment” slot (altExp) within the SCE. This
allows for a matrix with a different number of rows to be stored within
the same SCE object (rather than creating two SCE objects). The celda
functions described in the next several sections operate on a matrix
stored in the altExp slot. The default name given to the
alternative experiment and used in all downstream celda functions is
"featureSubset". If the altExpName parameter
is changed here, then it will need to be supplied to downstream plotting
functions as well. The list of alternative experiments in an SCE can be
view with altExpNames(sce). If you have already have an SCE
with selected features or do not want to perform feature selection, then
you need to set the alternative experiment directly with a command like
altExp(sce, "featureSubset") <- assay(sce, "counts"). In
the future, this will be updated to be more simple by utilizing the
ExperimentSubset package.
If the number of features is still relatively large (e.g. >5000),
an alternative approach is to select highly variable features that can
be used in the downstream clustering. The advantage of this approach is
that it can greatly speed up celda and can improve with module detection
among highly variable features with overall lower expression. The
disadvantage of this approach is that features that do not fall into the
highly variable group will not be clustered into modules. The celda
package does not include methods for selection of highly variable genes
(HVGs). However, the singleCellTK provides wrappers for
methods used in Seurat and Scran.
We recommend keeping at least 2,000-5,000 HVGs for clustering. Here is
some example code of how to select the top 5,000 most variable genes and
store it back in the SCE as an altExp:
library(singleCellTK)
sce <- seuratFindHVG(sce, useAssay = useAssay, hvgMethod = "vst")
g <- getTopHVG(sce, method = "vst", n = 5000)
altExp(sce, altExpName) <- sce[g, ]For the rest of the analysis with the PBMC3K data, we will use the first approach where features with at least 3 counts in 3 cells were included.
As mentioned earlier, celda is discrete Bayesian model that is able
to simultaneously bi-cluster features into modules and cells into cell
clusters. The primary bi-clustering model can be accessed with the
function celda_CG. This function operates on a matrix
stored as an alternative experiment in the altExp slot. If
you did not perform feature selection as recommended in the previous
section and your matrix of interest is not currently located in an
altExp slot, the following code can be used to copy a
matrix in the main assay slot to the altExp slot:
useAssay <- "counts"
altExpName <- "featureSubset"
altExp(sce, altExpName) <- assay(sce, useAssay)`. The two major adjustable parameters in this model are L,
the number of modules, and K, the number of cell
populations. The following code bi-clusters the PBMC3K dataset into 100
modules and 15 cell populations:
sce <- celda_CG(sce, L = 100, K = 15, useAssay = useAssay, altExpName = altExpName)However, in most cases, the number of feature modules
(L) and the number of cell clusters (K) are
not known beforehand. In the next sections, we outline procedures that
can be used suggest reasonable choices for these parameters. If the data
is clustered with the code above by supplying K and L directly to the
celda_CG function, then you can skip the next section and
proceed to Creating 2-D embeddings.
In order to help choose a reasonable solutions for L and K, celda
provides step-wise splitting procedures along with measurements of
perplexity to suggest reasonable choices for L and
K. First, the function recursiveSplitModule
can be used to cluster features into modules for a range of
L. Within each step, the best split of an existing module
into 2 new modules is chosen to create the L-th module. The module
labels of the previous model with \(L-1\) modules are used as the initial
starting values in the next model with \(L\) modules. Note that the initialization
step may take longer with larger numbers of cells in the dataset and the
splitting procedure will take longer with larger numbers features in the
dataset. Celda models with a L range between initialL = 10
and maxL = 150 are tested in the example below.
moduleSplit <- recursiveSplitModule(sce, useAssay = useAssay, altExpName = altExpName, initialL = 10, maxL = 150)Perplexity has been commonly used in the topic models to measure how
well a probabilistic model predicts observed samples (Blei
et al., 2003). Here, we use perplexity to evaluate the performance
of individual models by calculating the probability of observing
expression counts given an estimated Celda model. Rather than performing
cross-validation which is computationally expensive, a series of test
sets are created by sampling the counts from each cell according to a
multinomial distribution defined by dividing the counts for each gene in
the cell by the total number of counts for that cell. Perplexity is then
calculated on each test set and can be visualized using function
plotGridSearchPerplexity. A lower perplexity indicates a
better model fit.
plotGridSearchPerplexity(moduleSplit, altExpName = altExpName, sep = 10)
The perplexity alone often does not show a clear elbow or “leveling
off”. However, the rate of perplexity change (RPC) can be more
informative to determine when adding new modules does not add much
additional information Zhao
et al., 2015). An RPC closer to zero indicates that the addition of
new modules or cell clusters is not substantially decreasing the
perplexity. The RPC of models can be visualized using function
plotRPC:
plotRPC(moduleSplit, altExpName = altExpName)
In this case, we will choose an L of 80 as the RPC curve
tends to level off at this point:
L <- 80L. However, they may not always give a clear “leveling off”
depending of the complexity and quality of the dataset. Do not
give up if the choice of L is unclear or imperfect! If the
L to choose is unclear from these, then you can set a
somewhat high number (e.g. 75) and move to the next step of selecting
K. Later on, manual review of modules using functions such
as moduleHeatmap can give a sense of whether individual
modules should be further split up by selecting higher L.
For example, you can start exploring the cell populations and modules
with L = 75. If some modules need to be further split, you
can then try L = 100, L = 125, and so
on.Now we extract the Celda model of L =\(L\) with function
subsetCeldaList and run recursiveSplitCell to
fit models with a range of K between 3 and 25:
temp <- subsetCeldaList(moduleSplit, list(L = L))
sce <- recursiveSplitCell(sce, useAssay = useAssay, altExpName = altExpName, initialK = 3, maxK = 25, yInit = celdaModules(temp))The perplexities and RPC of models can be visualized using the same
functions plotGridSearchPerplexity and
plotRPC.

plotRPC(sce, , altExpName = altExpName)
The perplexity continues to decrease with larger values of
K. The RPC generally levels off between 13 and 16 and we
choose the model with K = 14 for downstream analysis. The
follow code selects the final celda_CG model with
L = 80 and K = 14:
K <- 14
sce <- subsetCeldaList(sce, list(L = L, K = K))Note: Similar to choosing L, you can
guess an initial value of K based off of the perplexity and
RPC plots and then move to the downstream exploratory analyses described
in the next several sections. After reviewing the cell clusters on 2-D
embeddings and module heatmaps, you may have to come back to tweak the
choice of K until you have something that captures the
cellular heterogeneity within the data without “over-clustering” cells
into too many subpopulations. This may be an iterative procedure of
going back-and-forth between choices of K and plotting the
results. So do not let imperfect perplexity/PRC plots prevent you from
moving on to the rest of the analysis. Often times, using an initial
guess for K will allow you to move on in the analysis to
get a sense of the major sources of biological heterogeneity present in
the data.
After selecting a celda model with specific values of L
and K, we can then perform additional exploratory and
downstream analyses to understand the biology of the transcriptional
modules and cell populations. We can start by generating a dimension
reduction plot with the Uniform Manifold Approximation and Projection
(UMAP) method to visualize the relationships between the cells in a 2-D
embedding. This can be done with function celdaUmap.
sce <- celdaUmap(sce, useAssay = useAssay, altExpName = altExpName)Alternatively, a t-distributed stochastic neighbor embedding (t-SNE)
can be generated using function celdaTsne. The UMAP and
t-SNE plots generated by celdaUmap and
celdaTsne are computed based on the module probabilities
(analogous to using PCs from PCA). The calculated dimension reduction
coordinates for the cells are stored under the reducedDim
slot of the altExp slot in the original SCE object. The
follow command lists the names of the dimensionality reductions that can
be used in downstream plotting functions in the next few sections:
reducedDimNames(altExp(sce, altExpName))## [1] "decontX_UMAP" "celda_UMAP"
The function plotDimReduceCluster can be used to plot
the cluster labels for cell populations identified by celda on the
UMAP:
plotDimReduceCluster(sce, reducedDimName = "celda_UMAP", labelClusters = TRUE)
Usually, biological features of some cell populations are known a
priori and can be identified with known marker genes. The
expression of selected marker genes can be plotted on the UMAP with the
function plotDimReduceFeature.
markers <- c("CD3D", "IL7R", "CD4", "CD8B", "CD19", "FCGR3A", "CD14", "FCER1A", "PF4")
plotDimReduceFeature(x = sce, features = markers, reducedDimName = "celda_UMAP", useAssay = useAssay, altExpName = altExpName, normalize = TRUE)
The parameter displayName can be used to switch between
IDs stored in the rownames of the SCE and columns of the
rowData of the SCE. If the assay denoted by
useAssay is a raw counts matrix, then setting
normalize = TRUE is recommended (otherwise the z-score of
the raw counts will be plotted). When set to TRUE, each
count will be normalized by dividing by the total number of counts in
each cell. An alternative approach is to perform normalization with
another method and then point to the normalized assay with the
useAssay parameter. For example, normalization can be
performed with the scater package:
library(scater)
sce <- logNormCounts(sce, exprs_values = useAssay, name = "logcounts")
plotDimReduceFeature(x = sce, features = markers, reducedDimName = "celda_UMAP", useAssay = "logcounts", altExpName = altExpName, normalize = FALSE)This second approach may be faster if plotting a lot of marker genes or if the dataset is relatively large.
Once we identify of various cell subpopulations using the known marker genes, these custom labels can be added on the UMAP colored by cluster:
g <- plotDimReduceCluster(sce, reducedDimName = "celda_UMAP", altExpName = altExpName, labelClusters = TRUE)
labels <- c("1: Megakaryocytes",
"2: CD14+ Monocytes 1",
"3: CD14+ Monocytes 2",
"4: FCGR3A (CD16+) Monocytes",
"5: CD14+ Monocytes 3",
"6: CD8+ Cytotoxic T-cells",
"7: CD4+ T-cells",
"8: CD8+ Cytotoxic T-cells",
"9: B-cells",
"10: Naive CD8+ T-cells",
"11: Naive CD4+ T-cells",
"12: NK-cells",
"13: Unknown T-cells",
"14: Dendritic cells")
library(ggplot2)
g <- g + scale_color_manual(labels = labels,
values = distinctColors(length(labels)))
print(g)
Celda has the ability to identify modules of co-expressed features
and quantify the probability of these modules in each cell population.
An overview of the relationships between modules and cell subpopulations
can be explored with the function celdaProbabilityMap. The
“Absolute probability” heatmap on the left shows the proportion of
counts in each module for each cell population. The “Absolute
probability” map gives insights into the absolute abundance of a module
within a given cell subpopulation. The absolute heatmap can be used to
explore which modules are higher than other modules within a
cell population. The “Relative expression” map shows the
standardized (z-scored) module probabilities across cell subpopulations.
The relative heatmap can be used to explore which modules are relatively
higher than other modules across cell populations.
celdaProbabilityMap(sce, useAssay = useAssay, altExpName = altExpName)
In this plot, we can see a variety of patterns. Modules 15 - 20 are highly expressed across most cell populations indicating that they may contain housekeeping genes (e.g. ribosomal). Other modules are specific to a cell population or groups of cell populations. For example, module 35 is only on in population 1 while module 70 is expressed across populations 2, 3, and to some degree in population 5. The unknown T-cell population 13 has highly specific levels of modules 30. In the next section, we can look at the genes in these modules to gain insights into the biological properties of each of these cell populations.
The primary advantage of celda over other tools is that it can cluster features that are co-expressed across cells into modules. These modules are often more biologically coherent than features correlated with principal components from PCA. Below are several ways in which modules can be explored and visualized.
The function featureModuleTable can be used to get the
names of all features in each module into a data.frame.
# Save to a data.frame
ta <- featureModuleTable(sce, useAssay = useAssay, altExpName = altExpName)
dim(ta)## [1] 154 80
head(ta[,"L70"])## [1] "S100A9" "S100A8" "S100A12" "RBP7" "FOLR3" "C19orf59"
The parameter displayName can be used to switch between
IDs stored in the rownames of the SCE and columns of the
rowData of the SCE. The the outputFile
parameter is set, the table will be saved to a tab-delimited text file
instead of to a data.frame:
# Save to file called "modules.txt"
featureModuleTable(sce, useAssay = useAssay, altExpName = altExpName, outputFile = "modules.txt")The modules for this model are shown below:
| L1 | L2 | L3 | L4 | L5 | L6 | L7 | L8 | L9 | L10 | L11 | L12 | L13 | L14 | L15 | L16 | L17 | L18 | L19 | L20 | L21 | L22 | L23 | L24 | L25 | L26 | L27 | L28 | L29 | L30 | L31 | L32 | L33 | L34 | L35 | L36 | L37 | L38 | L39 | L40 | L41 | L42 | L43 | L44 | L45 | L46 | L47 | L48 | L49 | L50 | L51 | L52 | L53 | L54 | L55 | L56 | L57 | L58 | L59 | L60 | L61 | L62 | L63 | L64 | L65 | L66 | L67 | L68 | L69 | L70 | L71 | L72 | L73 | L74 | L75 | L76 | L77 | L78 | L79 | L80 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CCL3 | GNLY | CTSW | NKG7 | RPS19 | MT-CO2 | MT-CO3 | DDX5 | RPL28 | RPL18A | FOS | EIF1 | JUNB | GIMAP7 | RPL13A | RPS6 | RPS2 | RPL10 | RPL13 | RPS14 | RPSA | RPS27 | LTB | PTPRCAP | MALAT1 | LDHB | IL32 | CD79B | CD37 | TUBA1B | GAPDH | PPIA | ACTG1 | CCL5 | PPBP | RGS10 | OAZ1 | TAGLN2 | MT-ND1 | MT-CO1 | ARPC3 | SH3BGRL3 | CYBA | PTMA | TMSB10 | LAPTM5 | ARHGDIB | HLA-B | CFL1 | SRGN | ACTB | TMSB4X | C9orf142 | ANXA1 | UBB | B2M | MYL12A | HLA-A | FCGR3A | IFITM2 | FAM26F | FCER1G | AIF1 | FTH1 | FCER1A | HLA-DQA1 | HLA-DPB1 | CD74 | HLA-DRA | S100A9 | LYZ | CST3 | VIM | NEAT1 | S100A4 | GSTP1 | LGALS1 | GABARAP | TYROBP | FTL |
| IGFBP7 | GZMB | CD247 | GZMA | NACA | CD52 | MT-ND4 | TSC22D3 | RPS9 | RPL12 | FXYD5 | H3F3B | TMEM66 | GIMAP4 | RPS18 | RPS3 | RPL19 | RPL11 | RPL32 | EEF1A1 | JUN | RPL21 | MYC | CXCR4 | MYLIP | IL7R | CD3D | CD79A | SNHG7 | HMGB2 | EIF4A1 | HNRNPA2B1 | CORO1A | GZMK | PF4 | TUBA4A | FKBP1A | GDI2 | PFDN5 | LSP1 | YBX1 | SERF2 | CLIC1 | HNRNPA1 | EIF3K | SNX3 | UBC | SRP14 | PSMB9 | ITGB2 | PFN1 | GMFG | APOBEC3G | HCST | RAC2 | HLA-C | HSPA8 | CALM1 | RHOC | CTSC | NCF1 | FGR | LST1 | COTL1 | CLEC10A | HLA-DQB1 | HLA-DPA1 | IRF8 | HLA-DMA | S100A8 | LGALS2 | CFP | S100A10 | ISG15 | S100A6 | GPX1 | TYMP | TSPO | FCN1 | CTSS |
| HAVCR2 | FGFBP2 | GZMM | CST7 | NAP1L1 | PPDPF | MT-CYB | TXNIP | FAU | RPL8 | CD48 | DUSP1 | ZFP36L2 | FYB | RPS8 | RPS12 | RPLP1 | RPL6 | RPLP2 | RPS4X | NPM1 | RPS3A | SIT1 | ISG20 | ATM | NOSIP | CD3E | MS4A1 | SNX2 | EIF1AY | SLC25A5 | HMGB1 | CHCHD2 | LAG3 | HIST1H2AC | CDC42SE2 | TALDO1 | ATP5C1 | SLC25A6 | ATP6V0E1 | LY6E | ARPC1B | SUPT4H1 | RPL36AL | ATP5E | UQCRH | MYL12B | PSME1 | PPP1CA | CD63 | MYL6 | CAPZB | CDC37 | ID2 | UCP2 | HLA-E | EVL | CD99 | CDKN1C | MYO1G | LYN | CD86 | IFITM3 | SAT1 | ENHO | HLA-DQA2 | HLA-DRB1 | LAT2 | LY86 | S100A12 | MS4A6A | CPVL | NFKBIA | ANXA2 | S100A11 | AP1S2 | LGALS3 | RAC1 | NCF2 | NPC2 |
| CCL4L1 | CCL4 | LYAR | PRF1 | SOD1 | ATP6V1G1 | MT-ND2 | CIRBP | UBA52 | RPL29 | APRT | ITM2B | TMEM123 | GIMAP1 | RPL10A | RPL3 | RPL15 | RPL26 | RPS16 | RPL27A | RPSAP58 | RPS27A | BIRC3 | CD69 | ANKRD44 | GIMAP5 | CD7 | TCL1A | PRKCB | MANF | TPI1 | HSP90AA1 | ENO1 | SDPR | FERMT3 | H3F3A | PRDX6 | ATP5G2 | BRK1 | RHOA | ALDOA | IFI35 | COX7C | ATP5L | GABARAPL2 | YWHAB | RBM3 | PSMB8 | CTSD | ARPC2 | TUBA1A | ARGLU1 | RNF181 | CD53 | ARL6IP5 | IFITM1 | SEPT7 | CKB | ABI3 | POU2F2 | CD300C | CFD | PSAP | DNASE1L3 | CD1C | HLA-DRB5 | EAF2 | HLA-DMB | RBP7 | CD14 | IGSF6 | AMICA1 | PRELID1 | GRN | TNFSF13B | PYCARD | CDA | BRI3 | ||
| SPON2 | HOPX | GZMH | CYTIP | PRR13 | MT-ATP6 | LIMD2 | RPS24 | GNB2L1 | GSTK1 | KLF6 | BTG2 | CITED2 | RPS5 | RPS15A | RPS15 | RPL14 | RPS28 | TPT1 | HINT1 | RPL9 | RIC3 | STK17A | CARS | C12orf57 | CD2 | IGLL5 | ARL4A | HMGA1 | PKM | HMGN1 | SLC25A3 | TSC22D1 | DNAJB6 | GSTO1 | DNAJC8 | ZFP36 | C11orf31 | CALM2 | LAMTOR4 | IRF7 | CNBP | PSMA7 | POLD4 | OST4 | C19orf43 | TPM3 | MYO1F | EMP3 | PGK1 | TMCO1 | CKLF | PLAC8 | BIN2 | AES | PDIA3 | LYPD2 | ATP1B3 | SCPEP1 | FAM49A | SERPINA1 | TIMP1 | SERPINF1 | FCGR2B | PLD4 | MEF2C | RNASE6 | FOLR3 | ALDH2 | RAB32 | MYADM | IFI6 | CEBPD | RNF130 | TKT | SLC7A7 | CTSB | |||
| CLIC3 | C12orf75 | KLRD1 | CLNS1A | SYF2 | SSR2 | KLF2 | COX4I1 | RPS11 | CD44 | IER2 | NDFIP1 | SEPW1 | RPS23 | RPL23A | RPS7 | RPL27 | RPL36 | RPL7 | EIF4A2 | BTG1 | STMN3 | ACAP1 | DNAJB1 | LCK | IGJ | HVCN1 | STMN1 | COX8A | SRSF7 | LDHA | NRGN | LIMS1 | SOD2 | YWHAH | SERP1 | COPE | RNASET2 | COX5B | MIR142 | TMA7 | COX6B1 | SEP15 | NDUFA4 | SUMO2 | COX6A1 | FLNA | PSME2 | ARF1 | ADRM1 | IL10RA | CD164 | RAP1B | RARRES3 | LITAF | VMO1 | SPN | SYNGR2 | ARRB1 | CD68 | CEBPB | LILRA4 | PPP1R14A | CCDC50 | GAPT | C19orf59 | IL8 | CD33 | PPT1 | C1orf162 | FCGRT | STX11 | AP2S1 | FPR1 | BLVRA | |||||
| XCL2 | CD8A | CD160 | CCT7 | EVI2B | TMEM14B | HNRNPDL | BTF3 | RPL23 | PPP1R15A | GPSM3 | RWDD1 | GIMAP2 | RPL18 | RPL30 | RPL7A | EEF1D | RPL22 | RPL35A | SNRPD2 | GLTSCR2 | CRIP2 | RHOH | PRDX2 | LAT | LINC00926 | SYPL1 | PRDX4 | PSMB3 | RAN | PPIB | TUBB1 | MAX | POLE4 | MRPL14 | GPX4 | COX5A | DRAP1 | TCEB2 | LAP3 | EIF3F | VAMP8 | LSM7 | SKP1 | EIF3G | GUK1 | GLIPR2 | ARPC5 | LRRFIP1 | PPP2CA | CMTM3 | PPP1R18 | HLA-F | TRAF3IP3 | XBP1 | ADA | LYST | TCF7L2 | SPI1 | STXBP2 | PHACTR1 | CSF3R | CD302 | CORO1B | RHOG | CTSH | MNDA | ATP6V0B | MAFB | RHOB | ||||||||||
| TTC38 | KLRG1 | FCRL6 | ATP1A1 | YPEL5 | C19orf24 | ANAPC16 | PABPC1 | EEF2 | PLP2 | EDF1 | SOCS3 | DGCR6L | RPL5 | RPS25 | RPL35 | RPL24 | C6orf48 | RPL34 | SELL | FOXP1 | TNFRSF4 | BIN1 | CD27 | CD3G | BANK1 | ADK | CHTF8 | GNB2 | HNRNPK | SRSF3 | GNG11 | AP3S1 | RAB10 | SPINT2 | ATP5D | WDR83OS | SH3BGRL | BST2 | TMEM179B | UBE2D3 | ARL6IP4 | RASGRP2 | COX6C | TMBIM6 | TRAPPC1 | EFHD2 | ATP5B | CALM3 | PDHB | FYN | SCP2 | DHRS7 | IL2RG | ANXA6 | SH3BP1 | LYL1 | IFI30 | ASAH1 | PLBD1 | LILRB4 | TMEM14C | MT2A | BLVRB | FGL2 | RGS2 | FCGR2A | PLAUR | ||||||||||||
| AKR1C3 | ZAP70 | XCL1 | ARID4B | DAZAP2 | MPC2 | VAMP2 | EIF3H | NBEAL1 | KHDRBS1 | UQCR11 | TRADD | TMEM173 | RPS10 | RPL31 | RPL4 | RPL38 | C21orf33 | RPL17 | PEBP1 | RP11-796E2.4 | RP11-706O15.1 | OCIAD2 | LEPROTL1 | CD8B | VPREB3 | SMIM14 | SP140 | UQCRFS1 | MIF | PARK7 | RGS18 | CTSA | PTPN18 | H2AFJ | HIGD2A | FIS1 | BLOC1S1 | SF3B5 | PLSCR1 | ERP29 | NEDD8 | MX1 | PTPRC | SAP18 | CAP1 | PLEK | MSN | FKBP8 | TPST2 | ETHE1 | TMEM9B | TMEM50A | CCND3 | YWHAQ | GUSB | STX7 | APOBEC3A | BID | ASGR1 | RAB31 | DHRS4L2 | CSTB | NUP214 | KLF4 | LINC00936 | TMEM176B | CTSZ | ||||||||||||
| PRSS23 | APMAP | KLRC1 | THYN1 | SP110 | TMED4 | UBE2D2 | PNRC1 | EIF3L | SURF1 | SSR4 | CHURC1 | GBP1 | RPS13 | RPS20 | RPL37A | RPS4Y1 | C1orf228 | RPL36A | CMPK1 | KCNQ1OT1 | KRT1 | FAIM3 | PIK3IP1 | OPTN | FCER2 | MYCBP2 | HIST1H4C | ANAPC11 | SRSF2 | PSMB1 | CLU | NCOA4 | GLUL | TPM4 | FOSB | PSMB7 | RNH1 | GLRX | TMEM205 | GTF3A | NDUFA13 | MAF1 | PTGES3 | JTB | LCP1 | CD300A | HN1 | ATRAID | PPP6C | PTGER2 | ZNF207 | SIGIRR | SRP9 | ALOX5AP | FAM110A | YBX3 | MS4A7 | WARS | TNFAIP2 | CD4 | C10orf54 | IL1B | ODF3B | CNPY3 | VSTM1 | TBXAS1 | |||||||||||||
| GPR56 | SAMD3 | RP11-347P5.1 | CCDC12 | MRPL21 | UXT | UBXN1 | ZFAS1 | STAT1 | MT-ND5 | STAT3 | DNAJA2 | RPLP0 | RPS29 | RPL37 | TOMM7 | COMMD6 | PPA1 | LBH | RGCC | ITM2A | PDLIM1 | CYB561A3 | PCNA | PGAM1 | HNRNPC | H2AFZ | CD9 | RHEB | RNPEP | COMMD3 | NDUFA11 | SRSF9 | PTPN6 | SERPINB1 | TLN1 | PCBP2 | ATP5G3 | RCSD1 | ACTR3 | EID1 | HCLS1 | AOAH | DOK2 | M6PR | ATXN10 | CDC42EP3 | NDUFS2 | SH3KBP1 | FUS | TMED9 | ASB8 | SNX10 | HCK | NINJ1 | BST1 | UBE2Q1 | ANXA5 | PGD | SMCO4 | CAPG | RETN | G0S2 | |||||||||||||||||
| HBA1 | NCR3 | TTC3 | STUB1 | TMEM165 | PNISR | RPL39 | HPCAL1 | PCBP1 | TNFAIP3 | SORL1 | EEF1B2 | RPS26 | EIF3D | SRSF5 | ST13 | ANKRD12 | GPR183 | TCF7 | RORA | P2RX5 | MBD4 | NME1 | TIMM13 | SPCS1 | GHITM | ACRBP | ODC1 | FAM45A | ETFA | MCL1 | ARF5 | ZNF706 | ATP6V1F | ANXA4 | HSP90AB1 | UBL5 | DDT | COX7A2 | NDUFB8 | ATP5F1 | CX3CR1 | PTP4A2 | C1orf43 | TMX2 | GYG1 | PSMC4 | RASSF5 | DUSP2 | SUN2 | UTRN | NOTCH2NL | RP11-290F20.3 | CPPED1 | FCGR1A | H2AFY | MSRB1 | TGFBI | LGALS9 | MTMR11 | C5AR1 | |||||||||||||||||||
| PTGDS | CHST12 | RBM4 | C19orf70 | LYPLA1 | RBMX | ZFP36L1 | RBMS1 | ICAM3 | UBAC2 | PTGER4 | RPS21 | ZFAND1 | MZT2B | CCNI | CCDC109B | SLC2A3 | RCN2 | SPOCK2 | SH2D1A | BLK | IFT57 | SNRNP25 | MOB1A | C1QBP | PRDX1 | MMD | RBBP6 | MPP1 | CIR1 | LAMTOR1 | PPP4C | GNG5 | UBE2L6 | LAIR1 | ATP5O | COX7A2L | HERPUD1 | CIB1 | C9orf16 | VPS28 | MIR4435-1HG | CAPZA2 | GBP2 | RPL7L1 | IFI44 | SRP72 | SYTL1 | GYPC | TAP1 | TMEM18 | BTK | LILRA5 | DUSP6 | QPCT | PGLS | VCAN | NUDT16 | SAT2 | CSTA | MPEG1 | |||||||||||||||||||
| PTPN7 | DSCR3 | TNFRSF14 | PCSK7 | SON | APEX1 | USP3 | NDUFB11 | HSPB1 | MT1X | SNHG8 | FBL | RPL41 | AIMP1 | ARHGAP15 | PPM1K | CCR7 | ETS1 | FCRLA | PPAPDC1B | AHCY | CYC1 | TRMT112 | SNRPB | CA2 | AMD1 | PLEKHO1 | ADIPOR1 | SCAND1 | EIF5 | VASP | POLR2L | DDAH2 | HNRNPA0 | CRIP1 | PAPOLA | TMEM59 | UFC1 | DBI | ACTN4 | RAB8A | VPS29 | CLEC2B | SMIM12 | ZFAND6 | RASAL3 | PPP2R5C | BTN3A2 | INSIG1 | UNC93B1 | PILRA | TESC | PID1 | CARD16 | ID1 | PRAM1 | IFNGR2 | CYBB | ||||||||||||||||||||||
| TIGIT | BAZ1A | POLR2I | TBC1D10C | EIF2S3 | ACTR1B | CHMP4A | MRPS33 | RCBTB2 | EEF1G | EBPL | CUTA | TNFAIP8 | ARID5B | AQP3 | CDC25B | MZB1 | NAT9 | MCM5 | SNF8 | ERH | UBE2I | PTCRA | GRAP2 | MTHFD2 | FDFT1 | GNAS | LAMTOR5 | RBX1 | SEC11A | PARP14 | ANP32B | ATP5H | RTFDC1 | HNRNPF | ARF6 | DYNLL1 | ASCL2 | IDH2 | MKRN1 | EMG1 | FLOT1 | PMAIP1 | MAEA | DDIT4 | PRMT2 | CUX1 | SCIMP | LRRC25 | SLC16A3 | CXCL2 | CASP1 | CD1D | APLP2 | SLC11A1 | |||||||||||||||||||||||||
| PRR5 | MRPL18 | ARF4 | FAM65B | MED30 | SSU72 | RP11-51J9.5 | HNRNPH3 | EIF3E | PCNP | PPP3CC | FLT3LG | BCL11B | CD72 | RBM5 | FABP5 | FIBP | CNN2 | SEC61B | SPARC | RSU1 | SNN | GRSF1 | HSD17B11 | REEP5 | RGS19 | CASP4 | LMO4 | ATP5A1 | NDUFB10 | LSM10 | C11orf58 | HNRNPM | PSMB10 | MPST | CLTB | MYH9 | DPM1 | PSTPIP1 | RAB11B | RAB37 | TERF2IP | BUB3 | UBLCP1 | ALOX5 | LILRB2 | CTSL | EREG | ARRB2 | NCOR2 | JUND | CLEC7A | ||||||||||||||||||||||||||||
| KLRB1 | SMIM7 | N4BP2L1 | PPP1R2 | RP11-349A22.5 | RAP1A | IL27RA | EPB41L4A-AS1 | RSL1D1 | MZT2A | LY9 | TRAT1 | GATA3 | TSPAN13 | TPD52 | PTTG1 | TMEM208 | DAD1 | NDUFA1 | MYL9 | NT5C3A | TAX1BP3 | ILK | CHMP1B | ATP5EP2 | RTN4 | NOP10 | SKAP2 | NDUFB9 | UQCR10 | RPL22L1 | RBM39 | HNRNPA3 | PRDX5 | TMEM140 | PSMA4 | NMI | VPS25 | LINC00152 | NT5C | SEPT1 | HSPA5 | CDIP1 | HHEX | C19orf38 | NAGA | TNFSF10 | GSN | C4orf48 | SULT1A1 | ||||||||||||||||||||||||||||||
| MATK | TRIM22 | GMPR2 | NCOR1 | KIAA0040 | POLR1D | PRMT10 | IMPDH2 | CCNL1 | DDX24 | RP1-313I6.12 | PRKCQ-AS1 | SIRPG | HLA-DOB | ITM2C | MCM7 | MX2 | SERBP1 | ABRACL | GP9 | R3HDM4 | MFSD1 | CNDP2 | ERGIC3 | POLR2J | CHCHD10 | C20orf24 | EPN1 | C19orf53 | ATP5J2 | ZC3H15 | CDC42 | EIF3I | CSNK2B | OASL | MPC1 | CRELD2 | DNAJC1 | ATF6B | ARL4C | WIPF1 | PPM1N | SYK | SLC31A2 | GNAI2 | RASSF4 | AGTRAP | GNS | ||||||||||||||||||||||||||||||||
| IL2RB | GRK6 | C10orf32 | SP100 | IFI44L | UBE2K | TAF1D | NSA2 | TSTD1 | PITPNA-AS1 | LEF1 | GPR171 | SPIB | CXXC5 | FEN1 | TXNDC17 | TUBB | MDH2 | F13A1 | NGFRAP1 | MARCH2 | HIST1H2BK | IDH3G | SUMO3 | NDUFB5 | TMEM219 | U2AF1 | MRPL23 | LSM2 | YWHAZ | RABAC1 | NDUFB2 | UBE2B | MOB2 | SMARCA4 | FDPS | TECR | ARPC5L | CD83 | HMOX1 | DYNLT1 | ENTPD1 | GCA | ADAP2 | ||||||||||||||||||||||||||||||||||||
| S100B | C1orf63 | MESDC2 | MEAF6 | IGBP1 | LGALS3BP | ZNF331 | ILF3-AS1 | SBDS | NUP54 | MAL | AC092580.4 | PKIG | MAP3K8 | MAD2L1 | TIMM8B | SUB1 | PSMB6 | TMEM40 | CD82 | PARVB | THOC6 | MT-ND3 | DBNL | OAS1 | CLTA | ISCU | ATP5I | EIF1B | C4orf3 | ENSA | RAB7A | RNF149 | CMC1 | CTD-2035E11.3 | GNG2 | SEPT6 | LAMP1 | IFIT2 | LILRA3 | NAAA | OSM | C20orf27 | |||||||||||||||||||||||||||||||||||||
| CD320 | PPCS | POLR3K | RNPS1 | METTL9 | CAMK2G | YME1L1 | PSIP1 | CHI3L2 | SUSD3 | BLNK | IL4R | TYMS | MRPL28 | C14orf166 | ATP5J | TREML1 | PTTG1IP | CORO1C | POP7 | OXA1L | SNX17 | ATG3 | NDUFS7 | MORF4L1 | NDUFA2 | EZR | SEPT9 | TMEM258 | FAM49B | SQRDL | RAB9A | RAB27A | PHF14 | PIM1 | ARL6IP1 | SIGLEC10 | CSF1R | EPSTI1 | SULF2 | SCO2 | |||||||||||||||||||||||||||||||||||||||
| SH2D2A | ADAR | TLE4 | KRT10 | MRFAP1 | TMC8 | DDX18 | AAK1 | KIAA0125 | SWAP70 | EZH2 | SPCS3 | CYCS | UQCRQ | ITGA2B | ACTN1 | H1F0 | LMNA | TMEM147 | EIF6 | CD55 | AHNAK | NDUFS5 | ALKBH7 | DECR1 | PAIP2 | RBM8A | OSTF1 | UPP1 | MRPL19 | RRAGC | TMEM109 | ADD3 | MRPS6 | RELT | APOBEC3B | OAZ2 | MGST2 | NAPRT1 | |||||||||||||||||||||||||||||||||||||||||
| PLEKHF1 | DNAJC15 | MORF4L2 | EIF4B | NDUFA3 | ADSL | NDNL2 | LDLRAP1 | POU2AF1 | NXT2 | KIAA0101 | EIF3A | XRCC6 | CALR | CMTM5 | HIST1H1C | SOX4 | SLC39A3 | C1orf86 | XAF1 | MARCKSL1 | ZNHIT1 | NHP2L1 | RNF7 | MRPL43 | DNAJA1 | AP2M1 | ARPC4 | MGAT1 | DHX36 | FAM105A | MAPK1IP1L | HDAC1 | LSM14A | KYNU | GPBAR1 | HSBP1 | IER3 | EIF4EBP1 | |||||||||||||||||||||||||||||||||||||||||
| ZNHIT3 | SF3A1 | ARHGEF1 | CMTM7 | CCT2 | TTC39C | AL928768.3 | HELQ | RRM2 | ISOC2 | EIF4G2 | PSMD8 | CLDN5 | RUFY1 | DAPP1 | C14orf2 | WAS | ENY2 | VAMP5 | SMDT1 | MRPL54 | RNASEH2B | VAPA | IRF1 | ARHGDIA | MMP24-AS1 | GFER | CYB561D2 | ANXA2R | RNF167 | PIK3AP1 | RXRA | ATP6V0D1 | GMPR | NAGK | |||||||||||||||||||||||||||||||||||||||||||||
| PRKAR1A | SRSF11 | SQSTM1 | FAM107B | MGAT4A | RP5-887A10.1 | CENPN | GGCT | ATPIF1 | AURKAIP1 | ERV3-1 | RIOK3 | PQBP1 | PFDN2 | COX17 | SDCBP | CCDC85B | VDAC1 | YPEL3 | C9orf78 | TMEM230 | TBCB | ANAPC13 | CXXC1 | CELF1 | HMOX2 | C19orf66 | KIAA0930 | OSCAR | EIF4E2 | SRA1 | |||||||||||||||||||||||||||||||||||||||||||||||||
| APBB1IP | TAF7 | DPP7 | FGFR1OP2 | OXNAD1 | TNFRSF13B | GMNN | YIF1B | SPCS2 | MTDH | TUBA1C | ZNF263 | DNAJC7 | ACTR2 | LYSMD2 | SAMHD1 | EIF3M | HADHA | CCT6A | DYNLRB1 | KRTCAP2 | BAX | PLIN2 | STX18 | DTNBP1 | RPA2 | SLC9A3R1 | HES1 | LILRB1 | ZYX | MIR24-2 | |||||||||||||||||||||||||||||||||||||||||||||||||
| IKZF1 | KMT2E | FNBP1 | PIM2 | NUCB2 | TNFRSF17 | TK1 | HAUS4 | PSMC5 | PSMA5 | BBC3 | DERA | BNIP3L | ERP44 | GRB2 | SDHB | CSDE1 | PSMG2 | DRAM2 | SELK | PDCD6 | SELT | IFIT1 | PGM1 | BAZ2A | NDUFA5 | TSC22D4 | HES4 | TNFRSF1B | NR4A1 | ||||||||||||||||||||||||||||||||||||||||||||||||||
| SNAP23 | TAPSAR1 | MGST3 | DNAJB9 | RGL4 | GINS2 | HDGF | RANBP1 | H2AFV | PLA2G12A | PICALM | NENF | TWF2 | LSM6 | LAMTOR2 | SF1 | ETFB | CAPZA1 | PYURF | VDAC2 | TMBIM4 | ARRDC1 | DYNLL2 | HSH2D | DENND2D | BZW1 | CAMK1 | CAPNS1 | CECR1 | |||||||||||||||||||||||||||||||||||||||||||||||||||
| BEST1 | TCEA1 | NUCB1 | DEGS1 | CD40LG | ZWINT | PGP | DUT | TOMM22 | PGRMC1 | APP | AKR1A1 | EMC7 | ZFAND5 | NDUFS6 | TRA2B | COX14 | SMAP2 | S1PR4 | FAM96B | PSMD4 | IFI27 | APOBEC3C | TBCC | DEF6 | CD300E | UBE2D1 | MIDN | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| TAOK3 | UBXN4 | TGOLN2 | XXbac-BPG299F13.17 | TRABD2A | BIK | REEP3 | TBCA | NDUFC2 | FHL1 | MID1IP1 | NDUFS3 | CHMP4B | CD97 | NDUFV2 | PSMB4 | PLEKHJ1 | BCAP31 | LMAN2 | WDR1 | RSAD2 | ZBP1 | PBXIP1 | C5orf56 | CD300LF | THEMIS2 | GRINA | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| CARD8 | MED10 | PTPN2 | PARP1 | CD6 | CCNA2 | LMNB1 | HSP90B1 | LSM4 | SLC40A1 | HADHB | BAG1 | SMS | TGFB1 | UQCRC2 | PSMD9 | TMEM256 | ICAM2 | TAPBP | NDUFB7 | ATP5SL | ABHD14B | RSRC2 | ARHGEF40 | NANS | PTPRE | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| KLF3 | SEC62 | EVI2A | TOB1 | SH3YL1 | BIRC5 | ECH1 | PDIA6 | PNMA1 | TREX1 | LRPAP1 | CAT | GLIPR1 | EIF2S2 | ADI1 | BLOC1S2 | TMED2 | SUMO1 | RAB5C | COMMD10 | RGS1 | HDDC2 | CXCL16 | ATOX1 | CARS2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| LIPA | CCT4 | TNIP1 | SVIP | CAMK4 | XRCC5 | TUFM | TPM1 | PARL | SHKBP1 | LTA4H | RILPL2 | MINOS1 | NAA10 | WBP2 | SSBP1 | GADD45B | CHMP2A | CD38 | IFRD1 | SMARCE1 | TPPP3 | MAPKAPK3 | DNTTIP1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZNF394 | KIF5B | EMC10 | TMEM261 | SATB1 | HNRNPU | P4HB | CCDC69 | TRAPPC2L | FAM32A | SSR3 | MRPL20 | TXN2 | EIF5B | ANXA11 | GADD45GIP1 | SRI | ITGB7 | DDX6 | LILRA2 | CBR1 | MYO9B | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| CLK3 | STK4 | MTIF3 | BEX4 | FHIT | NCL | DEK | HTATIP2 | C7orf50 | PMVK | UQCRC1 | TOMM20 | MRPL41 | MMADHC | CD47 | ACP1 | SIVA1 | DNAJC19 | SDF4 | EMR2 | TNFRSF1A | UBE2R2 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| SNX9 | POLR3GL | FRG1 | CYLD | USP10 | NHP2 | UBE2L3 | ARHGAP4 | BNIP2 | YWHAE | SERPINB6 | C19orf60 | NDUFA7 | SLTM | NDUFB4 | PSMA1 | RALY | RNF139 | DERL1 | MS4A4A | ADRBK1 | LACTB | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| OCIAD1 | CSRNP1 | ASF1A | UXS1 | PA2G4 | C19orf10 | MRPL40 | MRPS23 | PARVG | LSMD1 | IDS | RNF187 | HAGH | NDUFA12 | RPS19BP1 | BSG | FKBP11 | PRPF31 | CTD-2006K23.1 | TCIRG1 | CDKN1A | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| MPHOSPH8 | SLC38A1 | CISH | NOL7 | MYEOV2 | MTHFS | FAM173A | ACAA1 | PHPT1 | ATF4 | OLA1 | COQ7 | PSMA2 | RPN2 | TCF25 | SKAP1 | SPSB3 | C1QA | NAMPT | CREG1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| CRBN | CDKN1B | PASK | MDH1 | PSMB2 | DNAJC4 | HBP1 | NFKBIZ | NDUFB1 | DGUOK | MRPL52 | FBXW5 | SSB | BANF1 | POLR2G | NAP1L4 | EBP | ZNF703 | ZNF106 | FUOM | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| TMEM243 | LSM5 | TNFRSF25 | CCT8 | MRPL51 | PRPF8 | TIMMDC1 | VMP1 | SPG21 | MRPS21 | MPG | PNKD | CELF2 | HMGN3 | NDUFS8 | YTHDF2 | GCHFR | CEBPA | AP2A1 | FBP1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| PPIG | G3BP1 | CCDC104 | HSPE1 | TXN | ZNF581 | ABTB1 | CYTH4 | IFI27L2 | CAMLG | NDUFA9 | TINF2 | RPS27L | KARS | BUD31 | STT3B | MFSD10 | ALDH3B1 | C11orf21 | PDXK | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| MED4 | MPHOSPH10 | INTS12 | PHB | SNRPD1 | ACAP2 | ZNF511 | MTPN | VIMP | PPP1CC | EMC6 | SMARCB1 | NUDC | SHISA5 | RNF213 | REXO2 | UBA2 | C1QB | NRROS | PLIN3 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| UGP2 | THAP7 | NELL2 | CCT3 | SNRPC | RPL26L1 | TPP1 | MYD88 | COMT | NDUFAF3 | MVP | SET | CAPN2 | IMP3 | ATP6AP2 | RBL2 | ALDH9A1 | HCAR3 | MANBA | ATF3 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| DCK | ID3 | AKTIP | EWSR1 | MRPS34 | TRMT1 | IFNGR1 | AKIRIN2 | MAP2K3 | ANXA7 | NDUFC1 | LRCH4 | IK | C16orf13 | MAP1LC3B | NOP58 | ORAI1 | CXCL3 | MBOAT7 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| WTAP | SLC25A45 | LINC00176 | HNRNPR | PSMD7 | FAM195A | DOCK2 | COMMD9 | CHCHD5 | HAX1 | COX7B | SIAH2 | EIF4H | CDC42SE1 | RER1 | CD96 | SURF4 | PRKCD | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ORMDL1 | RP11-489E7.4 | CD28 | CBX3 | ILF2 | FAM192A | IFIT3 | STX10 | MBNL1 | MRPS16 | UBE2J1 | HMHA1 | SPAG7 | FMNL1 | SH3BP5 | B3GAT3 | SGK1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| CCDC107 | GPATCH4 | SCGB3A1 | CACYBP | CCT5 | CINP | ERICH1 | HM13 | NR4A2 | ZNRD1 | C12orf10 | JAK1 | SDHC | DCTN3 | EMB | PLGRKT | RAB34 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| TAGAP | TNRC6C | OSTC | ATP5G1 | ARPC1A | VKORC1 | MRP63 | MKKS | SMCHD1 | TRAM1 | EMC4 | PCMT1 | PRKCH | GBP5 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| PDCD4 | RP11-291B21.2 | THOC7 | SHFM1 | RFXANK | ECHDC1 | SF3B2 | STRA13 | TANK | COPS6 | APH1A | CDKN2D | NUDT16L1 | PRKD2 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| TCP1 | HAPLN3 | PRMT1 | ANP32A | PCGF5 | SLA | RAD23A | GPI | STX5 | RAB2A | ARL5A | RBCK1 | ODF2L | TRPV2 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| CCNG1 | HSPD1 | NDUFAB1 | SNX5 | SNAPIN | TRAPPC6A | ANAPC15 | THRAP3 | DSTN | SF3B14 | LAPTM4A | C14orf1 | STAMBP | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| NONO | MATR3 | NUTF2 | ERCC1 | RTN3 | PHB2 | KXD1 | FAM96A | CCDC115 | PIN1 | CSK | CYB5B | MLLT11 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| IL16 | EIF5A | HSD17B10 | NCKAP1L | PEPD | RSL24D1 | MRPL16 | ISCA2 | RHOF | SRRM2 | CHMP5 | PSMD5-AS1 | SYNE1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| LUC7L3 | PDCD5 | POLR2E | IL10RB | STK38 | STK17B | SLIRP | AAMP | EAPP | CFLAR | SSNA1 | ORMDL3 | TFDP2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| FNTA | UBE2N | PPP1R7 | PFKL | C9orf89 | LINC00493 | UQCRB | CHPT1 | COPS5 | CCNDBP1 | WASF2 | CCDC167 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| N4BP2L2 | MAGOH | SEC11C | UNC119 | CDV3 | GGNBP2 | CTNNBL1 | REL | SNX6 | COPZ1 | KDELR2 | GIMAP6 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| TPR | PPHLN1 | SNRPD3 | ATP2B1 | HEXB | NSMCE1 | FAM50A | ITGAE | SARS | TMED10 | COX6A1P2 | PYCR2 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| G3BP2 | VDAC3 | PSMC3 | RABGAP1L | ZNF524 | CWC15 | KDELR1 | USF2 | FBXO7 | DDOST | SELPLG | RNF115 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ELOVL5 | NUCKS1 | OTUB1 | NFYC | VMA21 | PDCD2 | IRF2BP2 | ITGA4 | GNAI3 | SDHD | WSB1 | PTPN4 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| SCAF11 | AIP | MRPS7 | MTSS1 | PLD3 | MRPS18B | PSMD11 | NSFL1C | CCM2 | MRPL34 | C7orf73 | SYNE2 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| PRPF38B | RBM17 | PSMA3 | RALB | ATP6V1B2 | PRRC2C | HARS | GRPEL1 | SNHG15 | TMEM160 | CMTM6 | TMEM87A | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| SLC3A2 | GTF2A2 | CDK2AP2 | DPEP2 | CHIC2 | CNPY2 | SEC13 | SAMD9L | ABT1 | ZMAT2 | SUCLG1 | RBM38 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| DPY30 | STRAP | SRM | SNAP29 | OAS3 | RP11-1143G9.4 | FAM89B | CCS | RNF5 | IFI16 | PRDX3 | THAP11 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| CYTH1 | SNRPE | UFD1L | GCH1 | CEBPG | NECAP2 | NIT2 | ACO2 | FAM162A | TRAPPC3 | MIEN1 | OBFC1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| CCDC59 | EPC1 | POMP | GINM1 | CBWD1 | DDX17 | FLYWCH2 | MRPL12 | CCNH | C17orf62 | RNASEH2C | CD59 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| WHSC1L1 | RNF126 | MRPL11 | RIPK2 | TXNL1 | PTOV1 | STARD7 | KTN1 | PDAP1 | SH3GLB1 | CCND2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| SF3B1 | ADH5 | GTF3C6 | UTP6 | H1FX | FH | TRIM38 | LCP2 | RNF166 | AKAP13 | PHACTR4 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| RBM23 | PPM1G | SDF2L1 | PACSIN2 | MED28 | TRA2A | BCL7B | TXNDC12 | ELOF1 | GOLT1B | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| BBX | SNRPF | PSMC2 | LINC01003 | SRRM1 | COMMD5 | GLUD1 | IDH3B | NDUFA6 | MTFP1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| NAA38 | WBSCR22 | NDUFB3 | RBM25 | CKS1B | ELF1 | MRPL55 | ATP6V1E1 | CXCR3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| IRF9 | MAPRE1 | TUBB4B | DCXR | DUSP22 | COMMD8 | POLR2F | FKBP2 | GALM | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| MAT2B | METTL23 | XRN2 | SFPQ | PTPN1 | DCTN2 | ARHGAP30 | CAST | ACD | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| GPBP1 | SNRPA1 | CARHSP1 | CHMP3 | SRSF6 | TMEM126B | DDX46 | SFT2D1 | TNFRSF18 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| CHD2 | AKR1B1 | C11orf48 | DAP3 | TEN1 | NME4 | COMMD7 | CISD3 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| EGLN2 | PPP1R12A | NDUFS4 | MRPS15 | COMMD4 | CCZ1 | PPP1R11 | NDUFB6 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ARL2BP | STOML2 | TCEB1 | EIF2A | TSSC1 | RPF1 | AUP1 | UBE2F | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| RBBP7 | MRPL9 | IMP4 | MFNG | CWC25 | YY1 | PPP2R1A | PSENEN | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| RPAIN | MFF | MRPS18C | PSMF1 | CEPT1 | RAB11A | PSMD13 | FAM204A | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| NKTR | PITHD1 | SEC61G | PHF5A | CHD9 | KPNB1 | PET100 | SCAMP2 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| PNN | ANAPC5 | ZDHHC12 | TAF9 | CD81 | PDCL3 | ITGB1BP1 | NAPA | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| DARS | CDC123 | PTBP1 | NDUFA10 | RFC2 | GLG1 | TEX264 | OS9 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| NCBP2 | SNRPG | HNRNPD | SNW1 | ACP5 | METTL5 | CSNK1A1 | ASNA1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| AATF | LYRM4 | NUDT1 | FDX1 | UROS | LSM1 | MLX | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| RSF1 | GLRX3 | PAK2 | ECHS1 | MAGED2 | GSDMD | MEA1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| CHCHD7 | SRP19 | SEPT2 | MT-ND4L | ZCRB1 | RTF1 | UBE2A | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| AL592183.1 | C17orf89 | AK2 | DHPS | CCDC53 | NME3 | EMC3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| BTF3L4 | SNRNP70 | RPS6KB2 | PMF1 | APOL3 | TAF12 | STK10 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| FLI1 | PSMD6 | TIMM17A | SUCLG2 | SHARPIN | PUF60 | TPGS1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| NR3C1 | NUDT5 | DCTPP1 | PRKCSH | PPA2 | IAH1 | TMEM141 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ITSN2 | DDX39A | HMGN2 | BABAM1 | ZFAND2B | PDLIM2 | PKN1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZC3HAV1 | HSPA9 | MTCH2 | PABPC4 | SLFN5 | MGMT | UBE2E3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| MTERFD2 | VCP | MRPL47 | ATG12 | CCDC90B | MRPS12 | CCDC124 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| GRAMD1A | SRSF1 | UQCC2 | NXT1 | TMBIM1 | BRD2 | COA3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZNF24 | EIF4A3 | MRPL36 | PPIE | TSNAX | CANX | RPN1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| USE1 | CLPP | NKAP | PIH1D1 | SEPHS2 | NAA20 | RAB4A | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| USP16 | TTC1 | DAZAP1 | UBE2E1 | SLC44A2 | NOP56 | TADA3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ARFGAP2 | IDI1 | MRPL15 | QARS | ASCC2 | FUNDC2 | MRPS11 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| FRG1B | MRPS26 | TIMM10 | GID8 | BFAR | MTCH1 | DNAJC2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| TIMM9 | HNRNPUL1 | DTYMK | MRPL3 | CCDC25 | TXNL4A | FKBP5 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| MARCH7 | HINT2 | TOP1 | USP15 | NRBP1 | PTRHD1 | RRP7A | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| LENG1 | SLBP | PSMG3 | MRPL32 | PHF3 | SLC25A11 | DNAJB11 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| CYB5A | PSMD14 | DLD | MAPKAPK5-AS1 | SYS1 | COX16 | YIPF3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| SMIM19 | FKBP3 | DCPS | PTP4A1 | PPM1B | NDUFV1 | DIAPH1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| LINC-PINT | SDHA | TFDP1 | EIF1AX | YAF2 | ARHGAP9 | HNRNPH2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| OARD1 | NCBP2-AS2 | RP11-139H15.1 | LNPEP | LSM3 | PFDN6 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| TTC14 | SNRNP40 | FXR1 | PEX16 | GPAA1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| TMEM242 | RAD21 | EIF2B1 | SRSF4 | MAPK1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| DDIT3 | VBP1 | ESD | PHF11 | ACAA2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| SAFB2 | C14orf119 | SF3A3 | VTI1B | PANK2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| CEBPZ-AS1 | FAM177A1 | MCTS1 | AKR7A2 | IFNAR2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| BRD9 | HPRT1 | CCNC | URM1 | MAD2L2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| PRDM2 | SLC25A39 | COA6 | AHSA1 | CCDC88C | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| CD84 | RPA3 | PHYKPL | JAGN1 | RAB4B | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| EGR1 | SNRPA | CAMTA1 | GRHPR | G6PD | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| GLRX5 | C19orf25 | ROMO1 | ARFGAP3 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| TIAL1 | CHCHD1 | PFDN1 | BCCIP | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| EXOSC8 | EIF4E | STX8 | RABL6 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| LBR | C11orf83 | RNPEPL1 | SMC4 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ILF3 | SREK1 | TMEM248 | MVD | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| SAP30BP | ANKRD11 | GGA1 | MRPS28 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ACTR10 | TMEM138 | IFT20 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| NUDT21 | PTGES2 | AKAP9 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| UBE2G1 | LGALS8 | C18orf32 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| LARP7 | MLEC | BRMS1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| PPIH | C5orf15 | CAPN1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| EBNA1BP2 | FEM1B | DDRGK1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| VOPP1 | RAD23B | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| GNL3 | WDR33 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| CISD2 | WDR61 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| SSRP1 | SUGT1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| PDIA4 | LAGE3 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| SYNCRIP | RBBP4 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| C1orf35 | ESYT1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| MAP7D1 | APOA1BP | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| DNAJC9 | MIF4GD | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HAUS1 | CFDP1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ILKAP | UBE2J2 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| RPUSD3 | MRPS5 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| CDKN2AIPNL | SRPK2 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| POLD2 | FAM200B | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HNRNPAB | C17orf49 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| TMEM106C | NUBP2 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| CBX1 | PRKAG1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| NAA50 | SURF2 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| MCM3 | SSSCA1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| CISD1 | EI24 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| TFPT | CSNK1D | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| UBA5 | DCTD | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| LMAN1 | PEX2 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| PGRMC2 | PNKP | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| C19orf48 | TMEM70 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| C14orf142 | JMJD6 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| PAICS | DCAF5 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| RCE1 |
If you want to quickly find which module a particular feature was
assigned to, the featureModuleLookup function can be used.
Here will will look up a marker gene for T-cells called “CD3E”:
mod <- featureModuleLookup(sce, feature = c("CD3E", "S100A8"))
mod## CD3E S100A8
## 27 70
The function moduleHeatmap can be used to view the
expression of features across cells for a specific module. The
featureModule parameter denotes the module(s) to be
displayed. Cells are ordered from those with the lowest probability of
the module on the left to the highest probability on the right.
Similarly, features are ordered from those with the highest probability
within the module on the top to the lowest probability on the
bottom.
moduleHeatmap(sce, featureModule = 27, useAssay = useAssay, altExpName = altExpName)
The parameter topCells can be used to control the number
of cells included in the heatmap. By default, only the 100 cells with
the lowest probabilities and the 100 cells with the highest
probabilities for each selected module are included
(i.e. topCells = 100 by default). To display all cells,
this parameter can be set to NULL:
moduleHeatmap(sce, featureModule = 27, topCells = NULL, useAssay = useAssay, altExpName = altExpName)
Note: Multiple modules can be displayed by giving a
vector of module indices to the parameter featureModule. If
featureModule is not specified, then all modules will be
plotted.
The function plotDimReduceModule can be used visualize
the probabilities of a particular module or sets of modules on a reduced
dimensional plot such as a UMAP. This can be another quick method to see
how modules are expressed across various cells in 2-D space. As an
example, we can look at module 70 which contained S100A8:
plotDimReduceModule(sce, modules = 70, useAssay = useAssay, altExpName = altExpName, reducedDimName = "celda_UMAP")
Similarly, multiple modules can be plotting in a grid of UMAPs:
plotDimReduceModule(sce, modules = 70:78, useAssay = useAssay, altExpName = altExpName, reducedDimName = "celda_UMAP")
In this grid, we can see that module 70 (which has high levels of S100A8 and S100A9) is highly expressed in cell populations 2 and 3, module 71 (which contains CD14) can be used to identify all CD14+ monocytes, module 72 (which contains CST3) is expressed across both CD14 and FCGR3A (CD16) expressing monocytes, and module 73 (which contains CD4) is expressed broadly across both monocytes and dendritic cells as well as some T-cell populations. If we were interesting in defining transcriptional programs active across all monocytes, we could examine the genes found in module 72. If we were interested in defining transcriptional programs for all CD14+ monocytes, we could examine the genes in module 71. These patterns can also be observed in the Probability Map
In the celda probability map, we saw that the unknown T-cell population 13 had high levels of module 30. We can examine both module heatmaps and module probability maps to further explore this:
moduleHeatmap(sce, featureModule = 30, useAssay = useAssay, altExpName = altExpName)
plotDimReduceModule(sce, modules = 30, useAssay = useAssay, altExpName = altExpName, reducedDimName = "celda_UMAP")
Module 30 has high levels of genes associated with proliferation including HMGA1, STMN1, PCNA, HMGB2, and TUBA1B. We can therefore re-label these cells as “Proliferating T-cells”.
In addition to examining modules, differential expression can be used
to identify potential marker genes up-regulated in specific cell
populations. The function findMarkerDiffExp in the
singleCellTK package will find markers up-regulated in each
cell population compared to all the others.
# Normalize counts (if not performed previously)
library(scater)
sce <- logNormCounts(sce, exprs_values = useAssay, name = "logcounts")
# Run differential expression analysis
sce <- findMarkerDiffExp(sce, useAssay = "logcounts", method = "wilcox", cluster = celdaClusters(sce), minMeanExpr = 0, fdrThreshold = 0.05, log2fcThreshold = 0, minClustExprPerc = 0, maxCtrlExprPerc = 1)## Warning: 'findMarkerDiffExp' is deprecated.
## Use 'runFindMarker' instead.
## See help("Deprecated")
The function plotMarkerDiffExp can be used to plot the
results in a heatmap. The topN parameter will plot the top
N ranked genes for each cluster.
# Plot differentially expressed genes that pass additional thresholds 'minClustExprPerc' and 'maxCtrlExprPerc'
plotMarkerDiffExp(sce, topN = 5, log2fcThreshold = 0, rowLabel = TRUE, fdrThreshold = 0.05, minClustExprPerc = 0.6, maxCtrlExprPerc = 0.4, minMeanExpr = 0)## Warning: 'plotMarkerDiffExp' is deprecated.
## Use 'plotFindMarkerHeatmap' instead.
## See help("Deprecated")

Other parameters such as minClustExprPerc (the minimum
number of cells expressing the marker gene in the cluster) and
maxCtrlExprPerc (the maximum number of cells expression the
marker gene in other clusters) can be used to control how specific each
marker gene is to each cell populations. Similarly, adding a log2
fold-change cutoff (e.g. 1) can select for markers that are more
strongly up-regulated in a cell population.
The plotCeldaViolin function can be used to examine the
distribution of expression of various features across cell population
clusters derived from celda. Here we can see that the gene CD79A has
high expression in the B-cell cluster and HMGB2 has high expression in
the proliferating T-cell population.
# Normalize counts if not performed in previous steps
library(scater)
sce <- logNormCounts(sce, exprs_values = useAssay, name = "logcounts")
# Make violin plots for marker genes
plotCeldaViolin(sce, useAssay = "logcounts", features = c("CD79A", "HMGB2"))
The celda package comes with two functions for generating
comprehensive HTML reports that 1) capture the process of selecting K/L
for a celda_CG model and 2) plot the results from the
downstream analysis. The first report runs both
recursiveSplitModule and recursiveSplitCell
for selection of L and K, respectively. To
recapitulate the complete analysis presented in this tutorial in the
HTML report, the following command can be used:
sce <- reportCeldaCGRun(sce, sampleLabel = NULL, useAssay = useAssay, altExpName = altExpName, minCell = 3, minCount = 3, initialL = 10, maxL = 150, initialK = 3, maxK = 25, L = 80, K = 14)All of the parameters in this function are the same that were used
throughout this tutorial in the selectFeatures,
recursiveSplitModule, and recursiveSplitCell
functions. Note that this report does not do cell
filtering, so that must be completed before running this function. The
returned SCE object will have the celda_CG model with
selected K and L which can be used in any of
the downstream plotting functions as well as input into the second
plotting report described next.
The second report takes in as input an SCE object with a fitted
celda_CG model and systematically generates several plots
that facilitate exploratory analysis including cell subpopulation
cluster labels on 2-D embeddings, user-specified annotations on 2-D
embeddings, module heatmaps, module probabilities, expression of marker
genes on 2-D embeddings, and the celda probability map. The report can
be generated with the following code:
reportCeldaCGPlotResults(sce, reducedDimName = "celda_UMAP", features = markers, useAssay = useAssay, altExpName = altExpName, cellAnnot = c("total", "detected", "decontX_contamination", "subsets_mito_percent"), cellAnnotLabel = "scDblFinder_doublet_call")User-supplied annotations to plot on the 2-D embedding can be
specified through the cellAnnot and
cellAnnotLabel variables. Both parameters will allow for
plotting of variables stored in the colData of the SCE on the 2-D
embedding plot specified by reducedDimName parameter. For
cellAnnot, integer and numeric variables will be plotted as
as continuous variables while factors and characters will be plotted as
categorical variables. For cellAnnotLabel, all variables
will be coerced to a factor and the labels of the categories will be
plotted on the scatter plot.
The celda model factorizes the original matrix into three matrices:
1) module - The probability of each feature in each module (Psi)
2) cellPopulation - The probability of each module in each cell population (Phi)
3) sample - The probability of each cell population in each sample (Theta)
Additionally, we can calculate the probability of each module within
each cell (cell). The cell matrix can essentially be used to replace PCs
from PCA and is useful for downstream visualization (e.g. generating 2-D
embeddings). All of these matrices can be retrieved with the
factorizeMatrix function. The matrices are returned in
three different versions: unnormalized counts, proportions (normalized
by the total), or posterior estimates (where the Dirichlet concentration
parameter is added in before normalization).
# Factorize the original counts matrix
fm <- factorizeMatrix(sce)
# Three different version of each matrix:
names(fm)## [1] "counts" "proportions" "posterior"
# Get normalized proportional matrices
dim(fm$proportions$cell) # Matrix of module probabilities for each cell## [1] 80 2675
dim(fm$proportions$module) # Matrix of feature probabilities for each module## [1] 2639 80
dim(fm$proportions$cellPopulation) # Matrix of module probabilities for each cell population## [1] 80 14
dim(fm$proportions$sample) # Matrix of cell population probabilities in each sample## [1] 14 1
The parameter displayName can be used to change the
labels of the rows from the rownames to a column in the
rowData of the SCE object. The function is available in
plotDimReduceFeature and moduleHeatmap. For
example, if we did not change the rownames to
Symbol_TENx in the beginning of the tutorial, the following
code still could be run in moduleHeatmap to display the
gene symbol even if the rownames were set to the original
Ensembl IDs:
moduleHeatmap(sce, featureModule = 27, useAssay = useAssay, altExpName = altExpName, displayName = "Symbol_TENx")
## R version 4.3.3 (2024-02-29)
## Platform: x86_64-apple-darwin20 (64-bit)
## Running under: macOS Sonoma 14.4.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] scater_1.30.1 scuttle_1.12.0
## [3] kableExtra_1.4.0 knitr_1.45
## [5] ggplot2_3.5.0 celda_1.18.2
## [7] singleCellTK_2.12.2 TENxPBMCData_1.20.0
## [9] HDF5Array_1.30.0 rhdf5_2.46.1
## [11] DelayedArray_0.28.0 SparseArray_1.2.4
## [13] S4Arrays_1.2.1 abind_1.4-5
## [15] Matrix_1.6-5 SingleCellExperiment_1.24.0
## [17] SummarizedExperiment_1.32.0 Biobase_2.62.0
## [19] GenomicRanges_1.54.1 GenomeInfoDb_1.38.8
## [21] IRanges_2.36.0 S4Vectors_0.40.2
## [23] BiocGenerics_0.48.1 MatrixGenerics_1.14.0
## [25] matrixStats_1.2.0
##
## loaded via a namespace (and not attached):
## [1] later_1.3.2 BiocIO_1.12.0
## [3] bitops_1.0-7 filelock_1.0.3
## [5] tibble_3.2.1 R.oo_1.26.0
## [7] graph_1.80.0 XML_3.99-0.16.1
## [9] lifecycle_1.0.4 scDblFinder_1.16.0
## [11] doParallel_1.0.17 edgeR_4.0.16
## [13] lattice_0.22-5 MASS_7.3-60.0.1
## [15] magrittr_2.0.3 limma_3.58.1
## [17] sass_0.4.8 rmarkdown_2.25
## [19] jquerylib_0.1.4 yaml_2.3.8
## [21] metapod_1.10.1 httpuv_1.6.14
## [23] reticulate_1.34.0 cowplot_1.1.3
## [25] RColorBrewer_1.1-3 DBI_1.2.1
## [27] zlibbioc_1.48.2 Rtsne_0.17
## [29] purrr_1.0.2 R.utils_2.12.3
## [31] RCurl_1.98-1.14 WriteXLS_6.5.0
## [33] rappdirs_0.3.3 circlize_0.4.16
## [35] GenomeInfoDbData_1.2.11 ggrepel_0.9.5
## [37] irlba_2.3.5.1 eds_1.4.0
## [39] annotate_1.80.0 dqrng_0.3.2
## [41] svglite_2.1.3 pkgdown_2.0.7
## [43] DelayedMatrixStats_1.24.0 codetools_0.2-19
## [45] DropletUtils_1.22.0 xml2_1.3.6
## [47] shape_1.4.6.1 tidyselect_1.2.0
## [49] farver_2.1.1 ScaledMatrix_1.10.0
## [51] viridis_0.6.5 BiocFileCache_2.10.1
## [53] GenomicAlignments_1.38.2 jsonlite_1.8.8
## [55] GetoptLong_1.0.5 BiocNeighbors_1.20.2
## [57] ellipsis_0.3.2 iterators_1.0.14
## [59] systemfonts_1.0.6 dbscan_1.1-12
## [61] foreach_1.5.2 tools_4.3.3
## [63] ragg_1.3.0 Rcpp_1.0.12
## [65] glue_1.7.0 gridExtra_2.3
## [67] xfun_0.41 dplyr_1.1.4
## [69] withr_3.0.0 combinat_0.0-8
## [71] BiocManager_1.30.22 fastmap_1.1.1
## [73] MCMCprecision_0.4.0 rhdf5filters_1.14.1
## [75] bluster_1.12.0 fansi_1.0.6
## [77] digest_0.6.35 rsvd_1.0.5
## [79] R6_2.5.1 mime_0.12
## [81] textshaping_0.3.7 colorspace_2.1-0
## [83] Cairo_1.6-2 RSQLite_2.3.5
## [85] R.methodsS3_1.8.2 utf8_1.2.4
## [87] generics_0.1.3 data.table_1.15.4
## [89] FNN_1.1.4 rtracklayer_1.62.0
## [91] httr_1.4.7 uwot_0.1.16
## [93] pkgconfig_2.0.3 gtable_0.3.4
## [95] blob_1.2.4 ComplexHeatmap_2.18.0
## [97] XVector_0.42.0 htmltools_0.5.7
## [99] clue_0.3-65 GSEABase_1.64.0
## [101] scales_1.3.0 png_0.1-8
## [103] enrichR_3.2 scran_1.30.2
## [105] rstudioapi_0.15.0 reshape2_1.4.4
## [107] rjson_0.2.21 curl_5.2.1
## [109] GlobalOptions_0.1.2 cachem_1.0.8
## [111] stringr_1.5.1 BiocVersion_3.18.1
## [113] parallel_4.3.3 vipor_0.4.7
## [115] AnnotationDbi_1.64.1 restfulr_0.0.15
## [117] desc_1.4.3 pillar_1.9.0
## [119] grid_4.3.3 vctrs_0.6.5
## [121] promises_1.2.1 BiocSingular_1.18.0
## [123] dbplyr_2.4.0 beachmat_2.18.1
## [125] xtable_1.8-4 cluster_2.1.6
## [127] beeswarm_0.4.0 evaluate_0.23
## [129] magick_2.8.2 cli_3.6.2
## [131] locfit_1.5-9.9 compiler_4.3.3
## [133] Rsamtools_2.18.0 rlang_1.1.3
## [135] crayon_1.5.2 labeling_0.4.3
## [137] plyr_1.8.9 fs_1.6.3
## [139] ggbeeswarm_0.7.2 stringi_1.8.3
## [141] viridisLite_0.4.2 BiocParallel_1.36.0
## [143] munsell_0.5.1 Biostrings_2.70.1
## [145] ExperimentHub_2.10.0 RcppEigen_0.3.4.0.0
## [147] GSVAdata_1.38.0 sparseMatrixStats_1.14.0
## [149] bit64_4.0.5 Rhdf5lib_1.24.1
## [151] KEGGREST_1.42.0 statmod_1.5.0
## [153] shiny_1.8.0 highr_0.10
## [155] interactiveDisplayBase_1.40.0 AnnotationHub_3.10.0
## [157] igraph_2.0.3 memoise_2.0.1
## [159] bslib_0.6.1 bit_4.0.5
## [161] xgboost_1.7.7.1
vignettes/decontX.Rmd
decontX.RmdDroplet-based microfluidic devices have become widely used to perform single-cell RNA sequencing (scRNA-seq). However, ambient RNA present in the cell suspension can be aberrantly counted along with a cell’s native mRNA and result in cross-contamination of transcripts between different cell populations. DecontX is a Bayesian method to estimate and remove contamination in individual cells. DecontX assumes the observed expression of a cell is a mixture of counts from two multinomial distributions: (1) a distribution of native transcript counts from the cell’s actual population and (2) a distribution of contaminating transcript counts from all other cell populations captured in the assay. Overall, computational decontamination of single cell counts can aid in downstream clustering and visualization.
The package can be loaded using the library command.
library(celda)DecontX can take either a SingleCellExperiment
object or a counts matrix as input. decontX will attempt to
convert any input matrix to class dgCMatrix from package Matrix
before starting the analysis.
To import datasets directly into an SCE object, the singleCellTK package has several importing functions for different preprocessing tools including CellRanger, STARsolo, BUStools, Optimus, DropEST, SEQC, and Alevin/Salmon. For example, the following code can be used as a template to read in the filtered and raw matrices for multiple samples processed with CellRanger:
library(singleCellTK)
sce <- importCellRanger(sampleDirs = c("path/to/sample1/", "path/to/sample2/"))Within each sample directory, there should be subfolders called
"outs/filtered_feature_bc_matrix/" or
"outs/raw_feature_bc_matrix/" with files called
matrix.mtx.gz, features.tsv.gz and
barcodes.tsv.gz. If these files are in different
subdirectories, the importCellRangerV3Sample function can
be used to import data from a different directory instead.
Optionally, the “raw” or “droplet” matrix can also be easily imported
by setting the dataType argument to “raw”:
sce.raw <- importCellRanger(sampleDirs = c("path/to/sample1/", "path/to/sample2/"), dataType = "raw")The raw matrix can be passed to the background parameter
in decontX as described below. If using Seurat, go to the
Working with Seurat section for details on how to
convert between SCE and Seurat objects.
We will utilize the 10X PBMC 4K dataset as an example in this vignette. This data can be easily retrieved from the package TENxPBMCData. Make sure the the column names are set before running decontX.
A SingleCellExperiment (SCE) object or a sparse matrix containing the
counts for filtered cells can be passed to decontX via the
x parameter. The matrix to use in an SCE object can be
specified with the assayName parameter, which is set to
"counts" by default. There are two major ways to run
decontX: with and without the raw/droplet matrix containing empty
droplets. Here is an example of running decontX without supplying the
background:
sce <- decontX(sce)In this scenario, decontX will estimate the
contamination distribution for each cell cluster based on the profiles
of the other cell clusters in the filtered dataset. The estimated
contamination results can be found in the
colData(sce)$decontX_contamination and the decontaminated
counts can be accessed with decontXcounts(sce).
decontX will perform heuristic clustering to quickly define
major cell clusters. However if you have your own cell cluster labels,
they can be specified with the z parameter. These results
will be used throughout the rest of the vignette.
The raw/droplet matrix can be used to empirically estimate the
distribution of ambient RNA, which is especially useful when cells that
contributed to the ambient RNA are not accurately represented in the
filtered count matrix containing the cells. For example, cells that were
removed via flow cytometry or that were more sensitive to lysis during
dissociation may have contributed to the ambient RNA but were not
measured in the filtered/cell matrix. The raw/droplet matrix can be
input as an SCE object or a sparse matrix using the
background parameter:
sce <- decontX(sce, background = sce.raw)Only empty droplets in the background matrix should be used to
estimate the ambient RNA. If any cell ids (i.e. colnames)
in the raw/droplet matrix supplied to the background
parameter are also found in the filtered counts matrix (x),
decontX will automatically remove them from the raw matrix. However, if
the cell ids are not available for the input matrices, decontX will
treat the entire background input as empty droplets. All of
the outputs are the same as when running decontX without setting the
background parameter.
Note: If the input object is just a matrix and not an SCE object, make sure to save the output into a variable with a different name (e.g.
result <- decontX(mat)). The result object will be a list with contamination inresult$contaminationand the decontaminated counts inresult$decontXcounts.
DecontX creates a UMAP which we can use to plot the cluster labels automatically identified in the analysis. Note that the clustering approach used here is designed to find “broad” cell types rather than individual cell subpopulations within a cell type.
umap <- reducedDim(sce, "decontX_UMAP")
plotDimReduceCluster(x = sce$decontX_clusters,
dim1 = umap[, 1], dim2 = umap[, 2])
The percentage of contamination in each cell can be plotting on the UMAP to visualize what what clusters may have higher levels of ambient RNA.

Known marker genes can also be plotted on the UMAP to identify the cell types for each cluster. We will use CD3D and CD3E for T-cells, LYZ, S100A8, and S100A9 for monocytes, CD79A, CD79B, and MS4A1 for B-cells, GNLY for NK-cells, and PPBP for megakaryocytes.
library(scater)
sce <- logNormCounts(sce)
plotDimReduceFeature(as.matrix(logcounts(sce)),
dim1 = umap[, 1],
dim2 = umap[, 2],
features = c("CD3D", "CD3E", "GNLY",
"LYZ", "S100A8", "S100A9",
"CD79A", "CD79B", "MS4A1"),
exactMatch = TRUE)## Warning in asMethod(object): sparse->dense coercion: allocating vector of size
## 1.1 GiB

The percetage of cells within a cluster that have detectable
expression of marker genes can be displayed in a barplot. Markers for
cell types need to be supplied in a named list. First, the detection of
marker genes in the original counts assay is shown:
markers <- list(Tcell_Markers = c("CD3E", "CD3D"),
Bcell_Markers = c("CD79A", "CD79B", "MS4A1"),
Monocyte_Markers = c("S100A8", "S100A9", "LYZ"),
NKcell_Markers = "GNLY")
cellTypeMappings <- list(Tcells = 2, Bcells = 5, Monocytes = 1, NKcells = 6)
plotDecontXMarkerPercentage(sce,
markers = markers,
groupClusters = cellTypeMappings,
assayName = "counts")
We can then look to see how much decontX removed aberrant expression
of marker genes in each cell type by changing the assayName
to decontXcounts:
plotDecontXMarkerPercentage(sce,
markers = markers,
groupClusters = cellTypeMappings,
assayName = "decontXcounts")
Percentages of marker genes detected in other cell types were reduced
or completely removed. For example, the percentage of cells that
expressed Monocyte marker genes was greatly reduced in T-cells, B-cells,
and NK-cells. The original counts and decontamined counts can be plotted
side-by-side by listing multiple assays in the assayName
parameter. This option is only available if the data is stored in
SingleCellExperiment object.
plotDecontXMarkerPercentage(sce,
markers = markers,
groupClusters = cellTypeMappings,
assayName = c("counts", "decontXcounts"))
Some helpful hints when using
plotDecontXMarkerPercentage:
groupCluster parameter, which also needs to be a named
list. If groupCluster is used, cell clusters not included
in the list will be excluded in the barplot. For example, if we wanted
to group T-cells and NK-cells together, we could set
cellTypeMappings <- list(NK_Tcells = c(2,6), Bcells = 5, Monocytes = 1)
threshold
parameter.SingleCellExperiment, then you
will need to supply the original counts matrix or the decontaminated
counts matrix as the first argument to generate the barplots.Another useful way to assess the amount of decontamination is to view
the expression of marker genes before and after decontX
across cell types. Here we view the monocyte markers in each cell type.
The violin plot shows that the markers have been removed from T-cells,
B-cells, and NK-cells, but are largely unaffected in monocytes.
plotDecontXMarkerExpression(sce,
markers = markers[["Monocyte_Markers"]],
groupClusters = cellTypeMappings,
ncol = 3)
Some helpful hints when using
plotDecontXMarkerExpression:
groupClusters works the same way as in
plotDecontXMarkerPercentage.groupClusters). Therefore, you may want
to keep the number of markers small in each plot and call the function
multiple times for different sets of marker genes.plotDots = TRUE and/or log transform the points on the fly
by setting log1p = TRUE.SingleCellExperiment. Therefore you could also examine
normalized expression of the original and decontaminated counts. For
example:
library(scater)
sce <- logNormCounts(sce,
exprs_values = "decontXcounts",
name = "decontXlogcounts")
plotDecontXMarkerExpression(sce,
markers = markers[["Monocyte_Markers"]],
groupClusters = cellTypeMappings,
ncol = 3,
assayName = c("logcounts", "decontXlogcounts"))
The ability of DecontX to accurately identify contamination is
dependent on the cell cluster labels. DecontX assumes that contamination
for a cell cluster comes from combination of counts from all other
clusters. The default clustering approach used by DecontX tends to
select fewer clusters that represent broader cell types. For example,
all T-cells tend to be clustered together rather than splitting naive
and cytotoxic T-cells into separate clusters. Custom cell type labels
can be suppled via the z parameter if some cells are not
being clustered appropriately by the default method.
There are ways to force decontX to estimate more or less
contamination across a dataset by manipulating the priors. The
delta parameter is a numeric vector of length two. It is
the concentration parameter for the Dirichlet distribution which serves
as the prior for the proportions of native and contamination counts in
each cell. The first element is the prior for the proportion of native
counts while the second element is the prior for the proportion of
contamination counts. These essentially act as pseudocounts for the
native and contamination in each cell. If
estimateDelta = TRUE, delta is only used to
produce a random sample of proportions for an initial value of
contamination in each cell. Then delta is updated in each
iteration. If estimateDelta = FALSE, then
delta is fixed with these values for the entire inference
procedure. Fixing delta and setting a high number in the
second element will force decontX to be more aggressive and
estimate higher levels of contamination in each cell at the expense of
potentially removing native expression. For example, in the previous
PBMC example, we can see what the estimated delta was by
looking in the estimates:
metadata(sce)$decontX$estimates$all_cells$delta## [1] 9.280108 1.038000
Setting a higher value in the second element of delta and
estimateDelta = FALSE will force decontX to
estimate higher levels of contamination per cell:
sce.delta <- decontX(sce, delta = c(9, 20), estimateDelta = FALSE)
plot(sce$decontX_contamination, sce.delta$decontX_contamination,
xlab = "DecontX estimated priors",
ylab = "Setting priors to estimate higher contamination")
abline(0, 1, col = "red", lwd = 2)
If you are using the Seurat package for downstream analysis, the following code can be used to read in a matrix and convert between Seurat and SCE objects:
# Read counts from CellRanger output
library(Seurat)
counts <- Read10X("sample/outs/filtered_feature_bc_matrix/")
# Create a SingleCellExperiment object and run decontX
sce <- SingleCellExperiment(list(counts = counts))
sce <- decontX(sce)
# Create a Seurat object from a SCE with decontX results
seuratObject <- CreateSeuratObject(round(decontXcounts(sce)))Optionally, the “raw” matrix can be also be imported and used as the background:
counts.raw <- Read10X("sample/outs/raw_feature_bc_matrix/")
sce.raw <- SingleCellExperiment(list(counts = counts.raw))
sce <- decontX(sce, background = sce.raw)Note that the decontaminated matrix of decontX consists of floating point numbers and must be rounded to integers before adding it to a Seurat object. If you already have a Seurat object containing the counts matrix and would like to run decontX, you can retrieve the count matrix, create a SCE object, and run decontX, and then add it back to the Seurat object:
counts <- GetAssayData(object = seuratObject, slot = "counts")
sce <- SingleCellExperiment(list(counts = counts))
sce <- decontX(sce)
seuratObj[["decontXcounts"]] <- CreateAssayObject(counts = decontXcounts(sce))## R version 4.3.3 (2024-02-29)
## Platform: x86_64-apple-darwin20 (64-bit)
## Running under: macOS Sonoma 14.4.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] scater_1.30.1 ggplot2_3.5.0
## [3] scuttle_1.12.0 TENxPBMCData_1.20.0
## [5] HDF5Array_1.30.0 rhdf5_2.46.1
## [7] DelayedArray_0.28.0 SparseArray_1.2.4
## [9] S4Arrays_1.2.1 abind_1.4-5
## [11] celda_1.18.2 Matrix_1.6-5
## [13] SingleCellExperiment_1.24.0 SummarizedExperiment_1.32.0
## [15] Biobase_2.62.0 GenomicRanges_1.54.1
## [17] GenomeInfoDb_1.38.8 IRanges_2.36.0
## [19] S4Vectors_0.40.2 BiocGenerics_0.48.1
## [21] MatrixGenerics_1.14.0 matrixStats_1.2.0
## [23] BiocStyle_2.30.0
##
## loaded via a namespace (and not attached):
## [1] RColorBrewer_1.1-3 rstudioapi_0.15.0
## [3] jsonlite_1.8.8 magrittr_2.0.3
## [5] ggbeeswarm_0.7.2 farver_2.1.1
## [7] rmarkdown_2.25 fs_1.6.3
## [9] zlibbioc_1.48.2 ragg_1.3.0
## [11] vctrs_0.6.5 memoise_2.0.1
## [13] DelayedMatrixStats_1.24.0 RCurl_1.98-1.14
## [15] htmltools_0.5.7 AnnotationHub_3.10.0
## [17] curl_5.2.1 BiocNeighbors_1.20.2
## [19] Rhdf5lib_1.24.1 sass_0.4.8
## [21] bslib_0.6.1 desc_1.4.3
## [23] plyr_1.8.9 cachem_1.0.8
## [25] mime_0.12 lifecycle_1.0.4
## [27] iterators_1.0.14 pkgconfig_2.0.3
## [29] rsvd_1.0.5 R6_2.5.1
## [31] fastmap_1.1.1 GenomeInfoDbData_1.2.11
## [33] shiny_1.8.0 digest_0.6.35
## [35] colorspace_2.1-0 AnnotationDbi_1.64.1
## [37] irlba_2.3.5.1 ExperimentHub_2.10.0
## [39] textshaping_0.3.7 RSQLite_2.3.5
## [41] beachmat_2.18.1 labeling_0.4.3
## [43] filelock_1.0.3 WriteXLS_6.5.0
## [45] fansi_1.0.6 httr_1.4.7
## [47] compiler_4.3.3 bit64_4.0.5
## [49] withr_3.0.0 doParallel_1.0.17
## [51] BiocParallel_1.36.0 viridis_0.6.5
## [53] DBI_1.2.1 highr_0.10
## [55] rappdirs_0.3.3 rjson_0.2.21
## [57] tools_4.3.3 vipor_0.4.7
## [59] beeswarm_0.4.0 interactiveDisplayBase_1.40.0
## [61] httpuv_1.6.14 MCMCprecision_0.4.0
## [63] glue_1.7.0 dbscan_1.1-12
## [65] rhdf5filters_1.14.1 promises_1.2.1
## [67] grid_4.3.3 Rtsne_0.17
## [69] reshape2_1.4.4 generics_0.1.3
## [71] gtable_0.3.4 data.table_1.15.4
## [73] ScaledMatrix_1.10.0 BiocSingular_1.18.0
## [75] utf8_1.2.4 XVector_0.42.0
## [77] RcppAnnoy_0.0.22 ggrepel_0.9.5
## [79] BiocVersion_3.18.1 foreach_1.5.2
## [81] pillar_1.9.0 stringr_1.5.1
## [83] later_1.3.2 dplyr_1.1.4
## [85] BiocFileCache_2.10.1 lattice_0.22-5
## [87] bit_4.0.5 tidyselect_1.2.0
## [89] Biostrings_2.70.1 knitr_1.45
## [91] gridExtra_2.3 bookdown_0.37
## [93] xfun_0.41 stringi_1.8.3
## [95] yaml_2.3.8 evaluate_0.23
## [97] codetools_0.2-19 RcppEigen_0.3.4.0.0
## [99] tibble_3.2.1 BiocManager_1.30.22
## [101] cli_3.6.2 uwot_0.1.16
## [103] xtable_1.8-4 systemfonts_1.0.6
## [105] munsell_0.5.1 jquerylib_0.1.4
## [107] enrichR_3.2 Rcpp_1.0.12
## [109] dbplyr_2.4.0 png_0.1-8
## [111] parallel_4.3.3 ellipsis_0.3.2
## [113] pkgdown_2.0.7 blob_1.2.4
## [115] sparseMatrixStats_1.14.0 bitops_1.0-7
## [117] viridisLite_0.4.2 scales_1.3.0
## [119] purrr_1.0.2 crayon_1.5.2
## [121] combinat_0.0-8 rlang_1.1.3
## [123] KEGGREST_1.42.0
vignettes/articles/decontX_pbmc4k.Rmd
decontX_pbmc4k.RmdDroplet-based microfluidic devices have become widely used to perform single-cell RNA sequencing (scRNA-seq). However, ambient RNA present in the cell suspension can be aberrantly counted along with a cell’s native mRNA and result in cross-contamination of transcripts between different cell populations. DecontX is a Bayesian method to estimate and remove contamination in individual cells. DecontX assumes the observed expression of a cell is a mixture of counts from two multinomial distributions: (1) a distribution of native transcript counts from the cell’s actual population and (2) a distribution of contaminating transcript counts from all other cell populations captured in the assay. Overall, computational decontamination of single cell counts can aid in downstream clustering and visualization.
The package can be loaded using the library command.
library(celda)DecontX can take either a SingleCellExperiment
object or a counts matrix as input. decontX will attempt to
convert any input matrix to class dgCMatrix from package Matrix
before starting the analysis.
To import datasets directly into an SCE object, the singleCellTK package has several importing functions for different preprocessing tools including CellRanger, STARsolo, BUStools, Optimus, DropEST, SEQC, and Alevin/Salmon. For example, the following code can be used as a template to read in the filtered and raw matrices for multiple samples processed with CellRanger:
library(singleCellTK)
sce <- importCellRanger(sampleDirs = c("path/to/sample1/", "path/to/sample2/"))Within each sample directory, there should be subfolders called
"outs/filtered_feature_bc_matrix/" or
"outs/raw_feature_bc_matrix/" with files called
matrix.mtx.gz, features.tsv.gz and
barcodes.tsv.gz. If these files are in different
subdirectories, the importCellRangerV3Sample function can
be used to import data from a different directory instead.
Optionally, the “raw” or “droplet” matrix can also be easily imported
by setting the dataType argument to “raw”:
sce.raw <- importCellRanger(sampleDirs = c("path/to/sample1/", "path/to/sample2/"), dataType = "raw")The raw matrix can be passed to the background parameter
in decontX as described below. If using Seurat, go to the
Working with Seurat section for details on how to
convert between SCE and Seurat objects.
We will utilize the 10X PBMC 4K dataset as an example in this vignette. This data can be easily retrieved from the package TENxPBMCData. Make sure the the column names are set before running decontX.
A SingleCellExperiment (SCE) object or a sparse matrix containing the
counts for filtered cells can be passed to decontX via the
x parameter. The matrix to use in an SCE object can be
specified with the assayName parameter, which is set to
"counts" by default. There are two major ways to run
decontX: with and without the raw/droplet matrix containing empty
droplets. Here is an example of running decontX without supplying the
background:
sce <- decontX(sce)In this scenario, decontX will estimate the
contamination distribution for each cell cluster based on the profiles
of the other cell clusters in the filtered dataset. The estimated
contamination results can be found in the
colData(sce)$decontX_contamination and the decontaminated
counts can be accessed with decontXcounts(sce).
decontX will perform heuristic clustering to quickly define
major cell clusters. However if you have your own cell cluster labels,
they can be specified with the z parameter. These results
will be used throughout the rest of the vignette.
The raw/droplet matrix can be used to empirically estimate the
distribution of ambient RNA, which is especially useful when cells that
contributed to the ambient RNA are not accurately represented in the
filtered count matrix containing the cells. For example, cells that were
removed via flow cytometry or that were more sensitive to lysis during
dissociation may have contributed to the ambient RNA but were not
measured in the filtered/cell matrix. The raw/droplet matrix can be
input as an SCE object or a sparse matrix using the
background parameter:
sce <- decontX(sce, background = sce.raw)Only empty droplets in the background matrix should be used to
estimate the ambient RNA. If any cell ids (i.e. colnames)
in the raw/droplet matrix supplied to the background
parameter are also found in the filtered counts matrix (x),
decontX will automatically remove them from the raw matrix. However, if
the cell ids are not available for the input matrices, decontX will
treat the entire background input as empty droplets. All of
the outputs are the same as when running decontX without setting the
background parameter.
Note: If the input object is just a matrix and not an SCE object, make sure to save the output into a variable with a different name (e.g.
result <- decontX(mat)). The result object will be a list with contamination inresult$contaminationand the decontaminated counts inresult$decontXcounts.
DecontX creates a UMAP which we can use to plot the cluster labels automatically identified in the analysis. Note that the clustering approach used here is designed to find “broad” cell types rather than individual cell subpopulations within a cell type.
umap <- reducedDim(sce, "decontX_UMAP")
plotDimReduceCluster(x = sce$decontX_clusters,
dim1 = umap[, 1], dim2 = umap[, 2])
The percentage of contamination in each cell can be plotting on the UMAP to visualize what what clusters may have higher levels of ambient RNA.

Known marker genes can also be plotted on the UMAP to identify the cell types for each cluster. We will use CD3D and CD3E for T-cells, LYZ, S100A8, and S100A9 for monocytes, CD79A, CD79B, and MS4A1 for B-cells, GNLY for NK-cells, and PPBP for megakaryocytes.
library(scater)
sce <- logNormCounts(sce)
plotDimReduceFeature(as.matrix(logcounts(sce)),
dim1 = umap[, 1],
dim2 = umap[, 2],
features = c("CD3D", "CD3E", "GNLY",
"LYZ", "S100A8", "S100A9",
"CD79A", "CD79B", "MS4A1"),
exactMatch = TRUE)## Warning in asMethod(object): sparse->dense coercion: allocating vector of size
## 1.1 GiB

The percetage of cells within a cluster that have detectable
expression of marker genes can be displayed in a barplot. Markers for
cell types need to be supplied in a named list. First, the detection of
marker genes in the original counts assay is shown:
markers <- list(Tcell_Markers = c("CD3E", "CD3D"),
Bcell_Markers = c("CD79A", "CD79B", "MS4A1"),
Monocyte_Markers = c("S100A8", "S100A9", "LYZ"),
NKcell_Markers = "GNLY")
cellTypeMappings <- list(Tcells = 2, Bcells = 5, Monocytes = 1, NKcells = 6)
plotDecontXMarkerPercentage(sce,
markers = markers,
groupClusters = cellTypeMappings,
assayName = "counts")
We can then look to see how much decontX removed aberrant expression
of marker genes in each cell type by changing the assayName
to decontXcounts:
plotDecontXMarkerPercentage(sce,
markers = markers,
groupClusters = cellTypeMappings,
assayName = "decontXcounts")
Percentages of marker genes detected in other cell types were reduced
or completely removed. For example, the percentage of cells that
expressed Monocyte marker genes was greatly reduced in T-cells, B-cells,
and NK-cells. The original counts and decontamined counts can be plotted
side-by-side by listing multiple assays in the assayName
parameter. This option is only available if the data is stored in
SingleCellExperiment object.
plotDecontXMarkerPercentage(sce,
markers = markers,
groupClusters = cellTypeMappings,
assayName = c("counts", "decontXcounts"))
Some helpful hints when using
plotDecontXMarkerPercentage:
groupCluster parameter, which also needs to be a named
list. If groupCluster is used, cell clusters not included
in the list will be excluded in the barplot. For example, if we wanted
to group T-cells and NK-cells together, we could set
cellTypeMappings <- list(NK_Tcells = c(2,6), Bcells = 5, Monocytes = 1)
threshold
parameter.SingleCellExperiment, then you
will need to supply the original counts matrix or the decontaminated
counts matrix as the first argument to generate the barplots.Another useful way to assess the amount of decontamination is to view
the expression of marker genes before and after decontX
across cell types. Here we view the monocyte markers in each cell type.
The violin plot shows that the markers have been removed from T-cells,
B-cells, and NK-cells, but are largely unaffected in monocytes.
plotDecontXMarkerExpression(sce,
markers = markers[["Monocyte_Markers"]],
groupClusters = cellTypeMappings,
ncol = 3)
Some helpful hints when using
plotDecontXMarkerExpression:
groupClusters works the same way as in
plotDecontXMarkerPercentage.groupClusters). Therefore, you may want
to keep the number of markers small in each plot and call the function
multiple times for different sets of marker genes.plotDots = TRUE and/or log transform the points on the fly
by setting log1p = TRUE.SingleCellExperiment. Therefore you could also examine
normalized expression of the original and decontaminated counts. For
example:
library(scater)
sce <- logNormCounts(sce,
exprs_values = "decontXcounts",
name = "decontXlogcounts")
plotDecontXMarkerExpression(sce,
markers = markers[["Monocyte_Markers"]],
groupClusters = cellTypeMappings,
ncol = 3,
assayName = c("logcounts", "decontXlogcounts"))
The ability of DecontX to accurately identify contamination is
dependent on the cell cluster labels. DecontX assumes that contamination
for a cell cluster comes from combination of counts from all other
clusters. The default clustering approach used by DecontX tends to
select fewer clusters that represent broader cell types. For example,
all T-cells tend to be clustered together rather than splitting naive
and cytotoxic T-cells into separate clusters. Custom cell type labels
can be suppled via the z parameter if some cells are not
being clustered appropriately by the default method.
There are ways to force decontX to estimate more or less
contamination across a dataset by manipulating the priors. The
delta parameter is a numeric vector of length two. It is
the concentration parameter for the Dirichlet distribution which serves
as the prior for the proportions of native and contamination counts in
each cell. The first element is the prior for the proportion of native
counts while the second element is the prior for the proportion of
contamination counts. These essentially act as pseudocounts for the
native and contamination in each cell. If
estimateDelta = TRUE, delta is only used to
produce a random sample of proportions for an initial value of
contamination in each cell. Then delta is updated in each
iteration. If estimateDelta = FALSE, then
delta is fixed with these values for the entire inference
procedure. Fixing delta and setting a high number in the
second element will force decontX to be more aggressive and
estimate higher levels of contamination in each cell at the expense of
potentially removing native expression. For example, in the previous
PBMC example, we can see what the estimated delta was by
looking in the estimates:
metadata(sce)$decontX$estimates$all_cells$delta## [1] 9.280108 1.038000
Setting a higher value in the second element of delta and
estimateDelta = FALSE will force decontX to
estimate higher levels of contamination per cell:
sce.delta <- decontX(sce, delta = c(9, 20), estimateDelta = FALSE)
plot(sce$decontX_contamination, sce.delta$decontX_contamination,
xlab = "DecontX estimated priors",
ylab = "Setting priors to estimate higher contamination")
abline(0, 1, col = "red", lwd = 2)
If you are using the Seurat package for downstream analysis, the following code can be used to read in a matrix and convert between Seurat and SCE objects:
# Read counts from CellRanger output
library(Seurat)
counts <- Read10X("sample/outs/filtered_feature_bc_matrix/")
# Create a SingleCellExperiment object and run decontX
sce <- SingleCellExperiment(list(counts = counts))
sce <- decontX(sce)
# Create a Seurat object from a SCE with decontX results
seuratObject <- CreateSeuratObject(round(decontXcounts(sce)))Optionally, the “raw” matrix can be also be imported and used as the background:
counts.raw <- Read10X("sample/outs/raw_feature_bc_matrix/")
sce.raw <- SingleCellExperiment(list(counts = counts.raw))
sce <- decontX(sce, background = sce.raw)Note that the decontaminated matrix of decontX consists of floating point numbers and must be rounded to integers before adding it to a Seurat object. If you already have a Seurat object containing the counts matrix and would like to run decontX, you can retrieve the count matrix, create a SCE object, and run decontX, and then add it back to the Seurat object:
counts <- GetAssayData(object = seuratObject, slot = "counts")
sce <- SingleCellExperiment(list(counts = counts))
sce <- decontX(sce)
seuratObj[["decontXcounts"]] <- CreateAssayObject(counts = decontXcounts(sce))## R version 4.3.3 (2024-02-29)
## Platform: x86_64-apple-darwin20 (64-bit)
## Running under: macOS Sonoma 14.4.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] scater_1.30.1 ggplot2_3.5.0
## [3] scuttle_1.12.0 TENxPBMCData_1.20.0
## [5] HDF5Array_1.30.0 rhdf5_2.46.1
## [7] DelayedArray_0.28.0 SparseArray_1.2.4
## [9] S4Arrays_1.2.1 abind_1.4-5
## [11] celda_1.18.2 Matrix_1.6-5
## [13] SingleCellExperiment_1.24.0 SummarizedExperiment_1.32.0
## [15] Biobase_2.62.0 GenomicRanges_1.54.1
## [17] GenomeInfoDb_1.38.8 IRanges_2.36.0
## [19] S4Vectors_0.40.2 BiocGenerics_0.48.1
## [21] MatrixGenerics_1.14.0 matrixStats_1.2.0
## [23] BiocStyle_2.30.0
##
## loaded via a namespace (and not attached):
## [1] RColorBrewer_1.1-3 rstudioapi_0.15.0
## [3] jsonlite_1.8.8 magrittr_2.0.3
## [5] ggbeeswarm_0.7.2 farver_2.1.1
## [7] rmarkdown_2.25 fs_1.6.3
## [9] zlibbioc_1.48.2 ragg_1.3.0
## [11] vctrs_0.6.5 memoise_2.0.1
## [13] DelayedMatrixStats_1.24.0 RCurl_1.98-1.14
## [15] htmltools_0.5.7 AnnotationHub_3.10.0
## [17] curl_5.2.1 BiocNeighbors_1.20.2
## [19] Rhdf5lib_1.24.1 sass_0.4.8
## [21] bslib_0.6.1 desc_1.4.3
## [23] plyr_1.8.9 cachem_1.0.8
## [25] mime_0.12 lifecycle_1.0.4
## [27] iterators_1.0.14 pkgconfig_2.0.3
## [29] rsvd_1.0.5 R6_2.5.1
## [31] fastmap_1.1.1 GenomeInfoDbData_1.2.11
## [33] shiny_1.8.0 digest_0.6.35
## [35] colorspace_2.1-0 AnnotationDbi_1.64.1
## [37] irlba_2.3.5.1 ExperimentHub_2.10.0
## [39] textshaping_0.3.7 RSQLite_2.3.5
## [41] beachmat_2.18.1 labeling_0.4.3
## [43] filelock_1.0.3 WriteXLS_6.5.0
## [45] fansi_1.0.6 httr_1.4.7
## [47] compiler_4.3.3 bit64_4.0.5
## [49] withr_3.0.0 doParallel_1.0.17
## [51] BiocParallel_1.36.0 viridis_0.6.5
## [53] DBI_1.2.1 highr_0.10
## [55] rappdirs_0.3.3 rjson_0.2.21
## [57] tools_4.3.3 vipor_0.4.7
## [59] beeswarm_0.4.0 interactiveDisplayBase_1.40.0
## [61] httpuv_1.6.14 MCMCprecision_0.4.0
## [63] glue_1.7.0 dbscan_1.1-12
## [65] rhdf5filters_1.14.1 promises_1.2.1
## [67] grid_4.3.3 Rtsne_0.17
## [69] reshape2_1.4.4 generics_0.1.3
## [71] gtable_0.3.4 data.table_1.15.4
## [73] ScaledMatrix_1.10.0 BiocSingular_1.18.0
## [75] utf8_1.2.4 XVector_0.42.0
## [77] RcppAnnoy_0.0.22 ggrepel_0.9.5
## [79] BiocVersion_3.18.1 foreach_1.5.2
## [81] pillar_1.9.0 stringr_1.5.1
## [83] later_1.3.2 dplyr_1.1.4
## [85] BiocFileCache_2.10.1 lattice_0.22-5
## [87] bit_4.0.5 tidyselect_1.2.0
## [89] Biostrings_2.70.1 knitr_1.45
## [91] gridExtra_2.3 bookdown_0.37
## [93] xfun_0.41 stringi_1.8.3
## [95] yaml_2.3.8 evaluate_0.23
## [97] codetools_0.2-19 RcppEigen_0.3.4.0.0
## [99] tibble_3.2.1 BiocManager_1.30.22
## [101] cli_3.6.2 uwot_0.1.16
## [103] xtable_1.8-4 systemfonts_1.0.6
## [105] munsell_0.5.1 jquerylib_0.1.4
## [107] enrichR_3.2 Rcpp_1.0.12
## [109] dbplyr_2.4.0 png_0.1-8
## [111] parallel_4.3.3 ellipsis_0.3.2
## [113] pkgdown_2.0.7 blob_1.2.4
## [115] sparseMatrixStats_1.14.0 bitops_1.0-7
## [117] viridisLite_0.4.2 scales_1.3.0
## [119] purrr_1.0.2 crayon_1.5.2
## [121] combinat_0.0-8 rlang_1.1.3
## [123] KEGGREST_1.42.0
vignettes/articles/installation.Rmd
installation.Rmd“celda” stands for “CEllular Latent Dirichlet Allocation”. It is a suite of Bayesian hierarchical models and supporting functions to perform gene and cell clustering for count data generated by single cell RNA-seq platforms. This algorithm is an extension of the Latent Dirichlet Allocation (LDA) topic modeling framework that has been popular in text mining applications. This package also includes a method called decontX which can be used to estimate and remove contamination in single cell genomic data.
To install the latest stable release of celda from Bioconductor (requires R version >= 3.6):
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("celda")The latest stable version of celda can be installed
from GitHub using devtools:
library(devtools)
install_github("campbio/celda")The development version of celda can also be
installed from GitHub using devtools:
library(devtools)
install_github("campbio/celda@devel")NOTE For MAC OSX users,
devtools::install_github() requires installation of
libgit2. This can be installed via homebrew:
brew install libgit2
'wchar.h' file not found, you can try the
method in this
link:could not find tools necessary to compile a package, you
can try typing this before running the install command:options(buildtools.check = function(action) TRUE)
Also, if you receive installation errors when Rcpp is being installed and compiled, try following the steps outlined here to solve the issue:
https://thecoatlessprofessor.com/programming/cpp/r-compiler-tools-for-rcpp-on-macos/
If you are running R 4.0.0 or later version on MacOS Catalina and you
see error 'wchar.h' file not found, you can try the method
in this link:
https://discourse.mc-stan.org/t/dealing-with-catalina-iii/12731/5
If you are trying to install on MacOS in an Apple Silicon computater and you see the following error:
ld: warning: directory not found for option '-L/opt/gfortran/lib/gcc/x86_64-apple-darwin20.0/12.2.0'
ld: warning: directory not found for option '-L/opt/gfortran/lib'
ld: library not found for -lgfortran
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [celda.so] Error 1
ERROR: compilation failed for package ‘celda’
You can solve this by downloading and installing the gfortran pkg located here and then running the following command:
You can solve this by downloading and installing the gfortran pkg located here and then running the following command:
sudo /opt/gfortran/bin/gfortran-update-sdk
To build the vignettes for Celda and DecontX during installation from GitHub, use the following command:
library(devtools)
install_github("campbio/celda", build_vignettes = TRUE)
Note that installation may take an extra 5-10 minutes for building of the vignettes. The Celda and DecontX vignettes can then be accessed via the following commands:
vignette("celda")
vignette("decontX")
“celda” stands for “CEllular Latent Dirichlet Allocation”. It is a suite of Bayesian hierarchical models and supporting functions to perform gene and cell clustering for count data generated by single cell RNA-seq platforms. This algorithm is an extension of the Latent Dirichlet Allocation (LDA) topic modeling framework that has been popular in text mining applications. This package also includes a method called DecontX which can be used to estimate and remove contamination in single cell genomic data.
To install the latest stable release of celda from Bioconductor (requires R version >= 3.6):
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("celda")
The latest stable version of celda can be installed from GitHub using devtools:
library(devtools)
install_github("campbio/celda")
The development version of celda can also be installed from GitHub using devtools:
library(devtools)
install_github("campbio/celda@devel")
NOTE For MAC OSX users, devtools::install_github() requires installation of libgit2. This can be installed via homebrew:
brew install libgit2
Also, if you receive installation errors when Rcpp is being installed and compiled, try following the steps outlined here to solve the issue:
https://thecoatlessprofessor.com/programming/cpp/r-compiler-tools-for-rcpp-on-macos/
If you are running R 4.0.0 or later version on MacOS Catalina and you see error 'wchar.h' file not found, you can try the method in this link:
https://discourse.mc-stan.org/t/dealing-with-catalina-iii/12731/5
If you are trying to install on MacOS in an Apple Silicon computater and you see the following error:
ld: warning: directory not found for option '-L/opt/gfortran/lib/gcc/x86_64-apple-darwin20.0/12.2.0'
ld: warning: directory not found for option '-L/opt/gfortran/lib'
ld: library not found for -lgfortran
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [celda.so] Error 1
ERROR: compilation failed for package ‘celda’
You can solve this by downloading and installing the gfortran pkg located here and then running the following command:
sudo /opt/gfortran/bin/gfortran-update-sdk
NOTE If you are trying to install celda using Rstudio and get this error: could not find tools necessary to compile a package, you can try this:
options(buildtools.check = function(action) TRUE)
To build the vignettes for Celda and DecontX during installation from GitHub, use the following command:
library(devtools)
install_github("campbio/celda", build_vignettes = TRUE)
Note that installation may take an extra 5-10 minutes for building of the vignettes. The Celda and DecontX vignettes can then be accessed via the following commands:
vignette("celda")
vignette("decontX")
Check out our Wiki for developer’s guide if you want to contribute! - Celda Development Coding Style Guide - Celda Development Robust and Efficient Code - Celda Development Rstudio configuration - FAQ on how to use celda - FAQ on package development
NEWS.md
NEWS.md file to track changes to the package.Returns a single celdaList representing the combination of two provided celdaList objects.
appendCeldaList(list1, list2)A celda_list object
A celda_list object to be joined with list_1
A celdaList object. This object contains all resList entries and runParam records from both lists.
data(celdaCGGridSearchRes)
appendedList <- appendCeldaList(
celdaCGGridSearchRes,
celdaCGGridSearchRes
)
available models
availableModelsAn object of class character of length 3.
Retrieves the final log-likelihood from all iterations of Gibbs sampling used to generate a celdaModel.
bestLogLikelihood(x, altExpName = "featureSubset")
# S4 method for SingleCellExperiment
bestLogLikelihood(x, altExpName = "featureSubset")
# S4 method for celdaModel
bestLogLikelihood(x)Numeric. The log-likelihood at the final step of Gibbs sampling used to generate the model.
List of available Celda models with correpsonding descriptions.
celda()None
celda()
#> celda_C: Clusters the columns of a count matrix containing single-cell data into K subpopulations.
#> celda_G: Clusters the rows of a count matrix containing single-cell data into L modules.
#> celda_CG: Clusters the rows and columns of a count matrix containing single-cell data into L modules and K subpopulations, respectively.
#> celdaGridSearch: Run Celda with different combinations of parameters and multiple chains in parallel.
Example results of old celdaGridSearch on celdaCGSim
celdaCGGridSearchResAn object as returned from old celdaGridSearch()
celda_CG model object generated from celdaCGSim using
old celda_CG function.
celdaCGModA celda_CG object
An deprecated example of simulated count matrix from the celda_CG model.
celdaCGSimA list of counts and properties as returned from old simulateCells().
Old celda_C results generated from celdaCSim
celdaCModA celda_C object
An old example simulated count matrix from the celda_C model.
celdaCSimA list of counts and properties as returned from old simulateCells().
R/accessors.R
celdaClusters.RdceldaClusters(x, altExpName = "featureSubset")
# S4 method for SingleCellExperiment
celdaClusters(x, altExpName = "featureSubset")
# S4 method for celdaModel
celdaClusters(x)
celdaClusters(x, altExpName = "featureSubset") <- value
# S4 method for SingleCellExperiment
celdaClusters(x, altExpName = "featureSubset") <- valueCan be one of
The name for the altExp slot to use. Default "featureSubset".
Character vector of cell cluster labels for replacements. Works
only if x is a SingleCellExperiment object.
One of
Character vector if x is a
SingleCellExperiment object.
Contains cell cluster labels for each cell in x.
List if x is a celda model object. Contains cell cluster
labels (for celda_C and celdaCG
Models) and/or feature module labels (for celda_G and celdaCG Models).
data(sceCeldaCG)
celdaClusters(sceCeldaCG)
#> [1] 1 2 2 2 1 1 3 1 1 2 3 2 4 3 2 1 2 4 4 1 3 5 3 2 1 3 3 2 3 3 5 3 2 5 5 3 4
#> [38] 4 3 2 1 2 1 1 2 3 4 2 5 3 5 1 1 3 1 3 3 1 4 5 4 4 1 3 5 2 5 2 1 3 1 2 4 1
#> [75] 5 2 1 3 4 4 3 5 1 1 4 4 4 1 1 3 3 1 3 1 1 4 4 3 5 3 4 3 4 4 1 3 4 4 1 3 1
#> [112] 3 4 3 1 3 3 3 3 3 5 4 4 4 4 1 1 4 1 4 1 4 1 1 1 5 4 1 4 3 5 4 4 5 4 3 3 3
#> [149] 1 4 4 4 1 4 1 4 3 3 5 4 1 1 4 4 3 4 1 3 2 4 4 3 1 4 1 5 1 3 4 5 1 4 4 3 4
#> [186] 3 5 4 5 5 5 5 5 5 2 5 1 2 3 2 5 5 5 2 1 5 5 2 4 2 1 1 5 5 5 5 5 5 2 2 5 2
#> [223] 5 1 2 5 1 5 2 5 5 5 1 2 1 5 2 5 3 5 5 2 3 5 5 1 3 2 5 5 5 2 5 4 5 5 5 5 5
#> [260] 5 1 3 2 5 2 5 2 3 2 5 2 5 5 1 1 5 5 1 4 5 5 5 3 3 1 2 1 2 3 3 2 1 1 3 1 1
#> [297] 1 3 1 3 3 3 2 3 3 5 5 1 1 3 3 3 1 3 3 3 3 1 1 3 3 3 1 3 5 2 1 1 1 1 1 3 1
#> [334] 2 3 3 1 3 5 1 3 1 3 5 3 3 3 1 1 5 1 3 3 3 4 1 4 3 4 3 1 2 1 1 4 2 1 4 4 3
#> [371] 5 1 4 5 1 3 5 3 3 1 3 5 1 4 4 4 3 3 1 3 1 5 1 3 3 5 3 1 1 1 3 1 2 1 2 4 1
#> [408] 2 4 3 1 4 1 5 1 3 2 1 5 2 1 5 2 4 1
#> Levels: 1 2 3 4 5
data(celdaCGMod)
celdaClusters(celdaCGMod)
#> $z
#> [1] 2 1 1 1 2 2 3 2 2 1 3 1 4 3 1 2 1 4 4 2 3 5 3 1 2 3 3 1 3 3 5 3 1 5 5 3 4
#> [38] 4 3 1 2 1 2 2 1 3 4 1 5 3 5 2 2 3 2 3 3 2 4 5 4 4 2 3 5 1 5 1 2 3 2 1 4 2
#> [75] 5 1 2 3 4 4 3 5 2 2 4 4 4 2 2 3 3 2 3 2 2 4 4 3 5 3 4 3 4 4 2 3 4 4 2 3 2
#> [112] 3 4 3 2 3 3 3 3 3 5 4 4 4 4 2 2 4 2 4 2 4 2 2 2 5 4 2 4 3 5 4 4 5 4 3 3 3
#> [149] 2 4 4 4 2 4 2 4 3 3 5 4 2 2 4 4 3 4 2 3 1 4 4 3 2 4 2 5 2 3 4 5 2 4 4 3 4
#> [186] 3 5 4 5 5 5 5 5 5 1 5 2 1 3 1 5 5 5 1 2 5 5 1 4 1 2 2 5 5 5 5 5 5 1 1 5 1
#> [223] 5 2 1 5 2 5 1 5 5 5 2 1 2 5 1 5 3 5 5 1 3 5 5 2 3 1 5 5 5 1 5 4 5 5 5 5 5
#> [260] 5 2 3 1 5 1 5 1 3 1 5 1 5 5 2 2 5 5 2 4 5 5 5 3 3 2 1 2 1 3 3 1 2 2 3 2 2
#> [297] 2 3 2 3 3 3 1 3 3 5 5 2 2 3 3 3 2 3 3 3 3 2 2 3 3 3 2 3 5 1 2 2 2 2 2 3 2
#> [334] 1 3 3 2 3 5 2 3 2 3 5 3 3 3 2 2 5 2 3 3 3 4 2 4 3 4 3 2 1 2 2 4 1 2 4 4 3
#> [371] 5 2 4 5 2 3 5 3 3 2 3 5 2 4 4 4 3 3 2 3 2 5 2 3 3 5 3 2 2 2 3 2 1 2 1 4 2
#> [408] 1 4 3 2 4 2 5 2 3 1 2 5 1 2 5 1 4 2
#>
#> $y
#> [1] 7 5 1 8 1 4 10 9 2 4 7 10 10 5 3 10 2 10 9 5 9 3 5 7 6
#> [26] 9 9 4 8 2 9 5 4 5 9 5 5 4 4 6 5 8 8 5 1 6 10 9 7 1
#> [51] 4 3 7 9 6 10 9 10 7 6 8 6 9 5 4 5 9 9 10 5 4 3 8 7 9
#> [76] 6 2 3 7 6 3 4 9 3 9 10 4 10 7 1 1 10 10 1 6 1 6 9 5 4
#>
Old celda_G results generated from celdaGsim
celdaGModA celda_G object
An old example simulated count matrix from the celda_G model.
celdaGSimA list of counts and properties as returned from old simulateCells()
Run Celda with different combinations of parameters and
multiple chains in parallel. The variable availableModels contains
the potential models that can be utilized. Different parameters to be tested
should be stored in a list and passed to the argument paramsTest.
Fixed parameters to be used in all models, such as sampleLabel, can
be passed as a list to the argument paramsFixed. When
verbose = TRUE, output from each chain will be sent to a log file
but not be displayed in stdout.
celdaGridSearch(
x,
useAssay = "counts",
altExpName = "featureSubset",
model,
paramsTest,
paramsFixed = NULL,
maxIter = 200,
nchains = 3,
cores = 1,
bestOnly = TRUE,
seed = 12345,
perplexity = TRUE,
verbose = TRUE,
logfilePrefix = "Celda"
)
# S4 method for SingleCellExperiment
celdaGridSearch(
x,
useAssay = "counts",
altExpName = "featureSubset",
model,
paramsTest,
paramsFixed = NULL,
maxIter = 200,
nchains = 3,
cores = 1,
bestOnly = TRUE,
seed = 12345,
perplexity = TRUE,
verbose = TRUE,
logfilePrefix = "Celda"
)
# S4 method for matrix
celdaGridSearch(
x,
useAssay = "counts",
altExpName = "featureSubset",
model,
paramsTest,
paramsFixed = NULL,
maxIter = 200,
nchains = 3,
cores = 1,
bestOnly = TRUE,
seed = 12345,
perplexity = TRUE,
verbose = TRUE,
logfilePrefix = "Celda"
)A numeric matrix of counts or a
SingleCellExperiment
with the matrix located in the assay slot under useAssay.
Rows represent features and columns represent cells.
A string specifying the name of the assay slot to use. Default "counts".
The name for the altExp slot to use. Default "featureSubset".
Celda model. Options available in availableModels.
List. A list denoting the combinations of parameters to
run in a celda model. For example,
list(K = seq(5, 10), L = seq(15, 20))
will run all combinations of K from 5 to 10 and L from 15 to 20 in model
celda_CG.
List. A list denoting additional parameters to use in each celda model. Default NULL.
Integer. Maximum number of iterations of sampling to perform. Default 200.
Integer. Number of random cluster initializations. Default 3.
Integer. The number of cores to use for parallel estimation of chains. Default 1.
Logical. Whether to return only the chain with the highest log likelihood per combination of parameters or return all chains. Default TRUE.
Integer. Passed to with_seed. For reproducibility,
a default value of 12345 is used. Seed values
seq(seed, (seed + nchains - 1)) will be supplied to each chain in
nchains. If NULL, no calls to
with_seed are made.
Logical. Whether to calculate perplexity for each model. If FALSE, then perplexity can be calculated later with resamplePerplexity. Default TRUE.
Logical. Whether to print log messages during celda chain execution. Default TRUE.
Character. Prefix for log files from worker threads and main process. Default "Celda".
A SingleCellExperiment object. Function parameter settings and celda model results are stored in the
metadata
"celda_grid_search" slot.
celda_G for feature clustering, celda_C for
clustering of cells, and celda_CG for simultaneous clustering of
features and cells. subsetCeldaList can subset the celdaList
object. selectBestModel can get the best model for each combination
of parameters.
if (FALSE) {
data(celdaCGSim)
## Run various combinations of parameters with 'celdaGridSearch'
celdaCGGridSearchRes <- celdaGridSearch(celdaCGSim$counts,
model = "celda_CG",
paramsTest = list(K = seq(4, 6), L = seq(9, 11)),
paramsFixed = list(sampleLabel = celdaCGSim$sampleLabel),
bestOnly = TRUE,
nchains = 1,
cores = 1)
}
Render a stylable heatmap of count data based on celda clustering results.
celdaHeatmap(
sce,
useAssay = "counts",
altExpName = "featureSubset",
featureIx = NULL,
nfeatures = 25,
...
)
# S4 method for SingleCellExperiment
celdaHeatmap(
sce,
useAssay = "counts",
altExpName = "featureSubset",
featureIx = NULL,
nfeatures = 25,
...
)A SingleCellExperiment object returned by celda_C, celda_G, or celda_CG.
A string specifying which assay slot to use. Default "counts".
The name for the altExp slot to use. Default "featureSubset".
Integer vector. Select features for display in heatmap. If
NULL, no subsetting will be performed. Default NULL. Only used for
sce containing celda_C model result returned by celda_C.
Integer. Maximum number of features to select for each
gene module. Default 25. Only used for sce containing
celda_CG or celda_G model results returned by celda_CG or
celda_G.
Additional parameters passed to plotHeatmap.
list A list containing dendrogram information and the heatmap grob
`celdaTsne()` for generating 2-dimensional tSNE coordinates
data(sceCeldaCG)
celdaHeatmap(sceCeldaCG)
#> TableGrob (5 x 6) "layout": 9 grobs
#> z cells name grob
#> 1 1 (2-2,3-3) col_tree polyline[GRID.polyline.16]
#> 2 2 (4-4,1-1) row_tree polyline[GRID.polyline.17]
#> 3 3 (4-4,3-3) matrix gTree[GRID.gTree.19]
#> 4 4 (3-3,3-3) col_annotation rect[GRID.rect.20]
#> 5 5 (3-3,4-4) col_annotation_names text[GRID.text.21]
#> 6 6 (4-4,2-2) row_annotation rect[GRID.rect.22]
#> 7 7 (5-5,2-2) row_annotation_names text[GRID.text.23]
#> 8 8 (4-5,6-6) annotationLegend gTree[GRID.gTree.31]
#> 9 9 (4-5,5-5) legend gTree[GRID.gTree.34]
celdaModel(sce, altExpName = "featureSubset")
# S4 method for SingleCellExperiment
celdaModel(sce, altExpName = "featureSubset")A SingleCellExperiment object returned by celda_C, celda_G, or celda_CG.
The name for the altExp slot to use. Default "featureSubset".
Character. The celda model. Can be one of "celda_C", "celda_G", or "celda_CG".
data(sceCeldaCG)
celdaModel(sceCeldaCG)
#> [1] "celda_CG"
R/accessors.R
celdaModules.RdceldaModules(sce, altExpName = "featureSubset")
# S4 method for SingleCellExperiment
celdaModules(sce, altExpName = "featureSubset")
celdaModules(sce, altExpName = "featureSubset") <- value
# S4 method for SingleCellExperiment
celdaModules(sce, altExpName = "featureSubset") <- valueA SingleCellExperiment object returned by
celda_G, or celda_CG, with the matrix
located in the useAssay assay slot.
Rows represent features and columns represent cells.
The name for the altExp slot to use. Default "featureSubset".
Character vector of feature module labels for replacements.
Works only if x is a SingleCellExperiment object.
Character vector. Contains feature module labels for each feature in x.
data(sceCeldaCG)
celdaModules(sceCeldaCG)
#> [1] 7 5 1 8 1 4 10 9 2 4 7 10 10 5 3 10 2 10 9 5 9 3 5 7 6
#> [26] 9 4 8 2 9 5 4 5 9 5 5 4 4 6 5 8 8 5 1 6 10 9 7 1 4
#> [51] 3 7 9 6 9 10 7 6 8 6 9 5 4 5 9 9 10 5 3 8 7 9 6 2 3
#> [76] 7 6 3 4 9 3 9 10 4 10 7 1 1 10 10 1 6 6 9 5 4
#> Levels: 1 2 3 4 5 6 7 8 9 10
R/accessors.R
celdaPerplexity-celdaList-method.RdReturns perplexity for each model in a celdaList as calculated by `perplexity().`
# S4 method for celdaList
celdaPerplexity(celdaList)An object of class celdaList.
List. Contains one celdaModel object for each of the parameters specified in the `runParams()` of the provided celda list.
data(celdaCGGridSearchRes)
celdaCGGridModelPerplexities <- celdaPerplexity(celdaCGGridSearchRes)
Returns perplexity for each model in a celdaList as calculated by `perplexity().`
celdaPerplexity(celdaList)An object of class celdaList.
List. Contains one celdaModel object for each of the parameters specified in the `runParams()` of the provided celda list.
data(celdaCGGridSearchRes)
celdaCGGridModelPerplexities <- celdaPerplexity(celdaCGGridSearchRes)
Renders probability and relative expression heatmaps to visualize the relationship between features and cell populations (or cell populations and samples).
celdaProbabilityMap(
sce,
useAssay = "counts",
altExpName = "featureSubset",
level = c("cellPopulation", "sample"),
ncols = 100,
col2 = circlize::colorRamp2(c(-2, 0, 2), c("#1E90FF", "#FFFFFF", "#CD2626")),
title1 = "Absolute probability",
title2 = "Relative expression",
showColumnNames = TRUE,
showRowNames = TRUE,
rowNamesgp = grid::gpar(fontsize = 8),
colNamesgp = grid::gpar(fontsize = 12),
clusterRows = FALSE,
clusterColumns = FALSE,
showHeatmapLegend = TRUE,
heatmapLegendParam = list(title = NULL, legend_height = grid::unit(6, "cm")),
...
)
# S4 method for SingleCellExperiment
celdaProbabilityMap(
sce,
useAssay = "counts",
altExpName = "featureSubset",
level = c("cellPopulation", "sample"),
ncols = 100,
col2 = circlize::colorRamp2(c(-2, 0, 2), c("#1E90FF", "#FFFFFF", "#CD2626")),
title1 = "Absolute probability",
title2 = "Relative expression",
showColumnNames = TRUE,
showRowNames = TRUE,
rowNamesgp = grid::gpar(fontsize = 8),
colNamesgp = grid::gpar(fontsize = 12),
clusterRows = FALSE,
clusterColumns = FALSE,
showHeatmapLegend = TRUE,
heatmapLegendParam = list(title = NULL, legend_height = grid::unit(6, "cm")),
...
)A SingleCellExperiment object returned by celda_C, celda_G, or celda_CG.
A string specifying which assay slot to use. Default "counts".
The name for the altExp slot to use. Default "featureSubset".
Character. One of "cellPopulation" or "Sample".
"cellPopulation" will display the absolute probabilities and relative
normalized expression of each module in each cell population.
level = "cellPopulation" only works for celda_CG sce
objects. "sample" will display the absolute probabilities and relative
normalized abundance of each cell population in each sample. Default
"cellPopulation".
The number of colors (>1) to be in the color palette of the absolute probability heatmap.
Passed to col argument of Heatmap.
Set color boundaries and colors for the relative expression heatmap.
Passed to column_title argument of
Heatmap. Figure title for the absolute probability
heatmap.
Passed to column_title argument of
Heatmap. Figure title for the relative expression
heatmap.
Passed to show_column_names argument of
Heatmap. Show column names.
Passed to show_row_names argument of
Heatmap. Show row names.
Passed to row_names_gp argument of
Heatmap. Set row name font.
Passed to column_names_gp argument of
Heatmap. Set column name font.
Passed to cluster_rows argument of
Heatmap. Cluster rows.
Passed to cluster_columns argument of
Heatmap. Cluster columns.
Passed to show_heatmap_legend argument of
Heatmap. Show heatmap legend.
Passed to heatmap_legend_param argument of
Heatmap. Heatmap legend parameters.
Additional parameters passed to Heatmap.
data(sceCeldaCG)
celdaProbabilityMap(sceCeldaCG)
sce objectR/celdatSNE.R
celdaTsne.RdEmbeds cells in two dimensions using Rtsne based
on a celda model. For celda_C sce objects, PCA on the normalized
counts is used to reduce the number of features before applying t-SNE. For
celda_CG and celda_G sce objects, tSNE is run on module
probabilities to reduce the number of features instead of using PCA.
Module probabilities are square-root transformed before applying tSNE.
celdaTsne(
sce,
useAssay = "counts",
altExpName = "featureSubset",
maxCells = NULL,
minClusterSize = 100,
initialDims = 20,
modules = NULL,
perplexity = 20,
maxIter = 2500,
normalize = "proportion",
scaleFactor = NULL,
transformationFun = sqrt,
seed = 12345
)
# S4 method for SingleCellExperiment
celdaTsne(
sce,
useAssay = "counts",
altExpName = "featureSubset",
maxCells = NULL,
minClusterSize = 100,
initialDims = 20,
modules = NULL,
perplexity = 20,
maxIter = 2500,
normalize = "proportion",
scaleFactor = NULL,
transformationFun = sqrt,
seed = 12345
)A SingleCellExperiment object returned by celda_C, celda_G, or celda_CG.
A string specifying which assay slot to use. Default "counts".
The name for the altExp slot to use. Default "featureSubset".
Integer. Maximum number of cells to plot. Cells will be
randomly subsampled if ncol(counts) > maxCells. Larger numbers of
cells requires more memory. If NULL, no subsampling will be
performed. Default NULL.
Integer. Do not subsample cell clusters below this threshold. Default 100.
Integer. PCA will be used to reduce the dimensionality of the dataset. The top 'initialDims' principal components will be used for tSNE. Default 20.
Integer vector. Determines which feature modules to use for
tSNE. If NULL, all modules will be used. Default NULL.
Numeric. Perplexity parameter for tSNE. Default 20.
Integer. Maximum number of iterations in tSNE generation. Default 2500.
Character. Passed to normalizeCounts in normalization step. Divides counts by the library sizes for each cell. One of 'proportion', 'cpm', 'median', or 'mean'. 'proportion' uses the total counts for each cell as the library size. 'cpm' divides the library size of each cell by one million to produce counts per million. 'median' divides the library size of each cell by the median library size across all cells. 'mean' divides the library size of each cell by the mean library size across all cells.
Numeric. Sets the scale factor for cell-level
normalization. This scale factor is multiplied to each cell after the
library size of each cell had been adjusted in normalize. Default
NULL which means no scale factor is applied.
Function. Applys a transformation such as 'sqrt',
'log', 'log2', 'log10', or 'log1p'. If NULL, no transformation will
be applied. Occurs after applying normalization and scale factor. Default
NULL.
Integer. Passed to with_seed. For reproducibility, a default value of 12345 is used. If NULL, no calls to with_seed are made.
sce with t-SNE coordinates
(columns "celda_tSNE1" & "celda_tSNE2") added to
reducedDim(sce, "celda_tSNE").
data(sceCeldaCG)
tsneRes <- celdaTsne(sceCeldaCG)
sce objectR/celdaUMAP.R
celdaUmap.RdEmbeds cells in two dimensions using umap based on
a celda model. For celda_C sce objects, PCA on the normalized counts
is used to reduce the number of features before applying UMAP. For celda_CG
sce object, UMAP is run on module probabilities to reduce the number
of features instead of using PCA. Module probabilities are square-root
transformed before applying UMAP.
celdaUmap(
sce,
useAssay = "counts",
altExpName = "featureSubset",
maxCells = NULL,
minClusterSize = 100,
modules = NULL,
seed = 12345,
nNeighbors = 30,
minDist = 0.75,
spread = 1,
pca = TRUE,
initialDims = 50,
normalize = "proportion",
scaleFactor = NULL,
transformationFun = sqrt,
cores = 1,
...
)
# S4 method for SingleCellExperiment
celdaUmap(
sce,
useAssay = "counts",
altExpName = "featureSubset",
maxCells = NULL,
minClusterSize = 100,
modules = NULL,
seed = 12345,
nNeighbors = 30,
minDist = 0.75,
spread = 1,
pca = TRUE,
initialDims = 50,
normalize = "proportion",
scaleFactor = NULL,
transformationFun = sqrt,
cores = 1,
...
)A SingleCellExperiment object returned by celda_C, celda_G, or celda_CG.
A string specifying which assay slot to use. Default "counts".
The name for the altExp slot to use. Default "featureSubset".
Integer. Maximum number of cells to plot. Cells will be
randomly subsampled if ncol(sce) > maxCells. Larger numbers of cells
requires more memory. If NULL, no subsampling will be performed.
Default NULL.
Integer. Do not subsample cell clusters below this threshold. Default 100.
Integer vector. Determines which features modules to use for UMAP. If NULL, all modules will be used. Default NULL.
Integer. Passed to with_seed. For reproducibility, a default value of 12345 is used. If NULL, no calls to with_seed are made.
The size of local neighborhood used for manifold approximation. Larger values result in more global views of the manifold, while smaller values result in more local data being preserved. Default 30. See umap for more information.
The effective minimum distance between embedded points. Smaller values will result in a more clustered/clumped embedding where nearby points on the manifold are drawn closer together, while larger values will result on a more even dispersal of points. Default 0.75. See umap for more information.
The effective scale of embedded points. In combination with
min_dist, this determines how clustered/clumped the
embedded points are. Default 1. See umap for more information.
Logical. Whether to perform
dimensionality reduction with PCA before UMAP. Only works for celda_C
sce objects.
Integer. Number of dimensions from PCA to use as
input in UMAP. Default 50. Only works for celda_C sce objects.
Character. Passed to normalizeCounts in normalization step. Divides counts by the library sizes for each cell. One of 'proportion', 'cpm', 'median', or 'mean'. 'proportion' uses the total counts for each cell as the library size. 'cpm' divides the library size of each cell by one million to produce counts per million. 'median' divides the library size of each cell by the median library size across all cells. 'mean' divides the library size of each cell by the mean library size across all cells.
Numeric. Sets the scale factor for cell-level
normalization. This scale factor is multiplied to each cell after the
library size of each cell had been adjusted in normalize. Default
NULL which means no scale factor is applied.
Function. Applys a transformation such as 'sqrt',
'log', 'log2', 'log10', or 'log1p'. If NULL, no transformation will
be applied. Occurs after applying normalization and scale factor. Default
NULL.
Number of threads to use. Default 1.
Additional parameters to pass to umap.
sce with UMAP coordinates
(columns "celda_UMAP1" & "celda_UMAP2") added to
reducedDim(sce, "celda_UMAP").
data(sceCeldaCG)
umapRes <- celdaUmap(sceCeldaCG)
Clusters the columns of a count matrix containing single-cell
data into K subpopulations. The
useAssay assay slot in
altExpName altExp slot will be used if
it exists. Otherwise, the useAssay
assay slot in x will be used if
x is a SingleCellExperiment object.
celda_C(
x,
useAssay = "counts",
altExpName = "featureSubset",
sampleLabel = NULL,
K,
alpha = 1,
beta = 1,
algorithm = c("EM", "Gibbs"),
stopIter = 10,
maxIter = 200,
splitOnIter = 10,
splitOnLast = TRUE,
seed = 12345,
nchains = 3,
zInitialize = c("split", "random", "predefined"),
countChecksum = NULL,
zInit = NULL,
logfile = NULL,
verbose = TRUE
)
# S4 method for SingleCellExperiment
celda_C(
x,
useAssay = "counts",
altExpName = "featureSubset",
sampleLabel = NULL,
K,
alpha = 1,
beta = 1,
algorithm = c("EM", "Gibbs"),
stopIter = 10,
maxIter = 200,
splitOnIter = 10,
splitOnLast = TRUE,
seed = 12345,
nchains = 3,
zInitialize = c("split", "random", "predefined"),
countChecksum = NULL,
zInit = NULL,
logfile = NULL,
verbose = TRUE
)
# S4 method for ANY
celda_C(
x,
useAssay = "counts",
altExpName = "featureSubset",
sampleLabel = NULL,
K,
alpha = 1,
beta = 1,
algorithm = c("EM", "Gibbs"),
stopIter = 10,
maxIter = 200,
splitOnIter = 10,
splitOnLast = TRUE,
seed = 12345,
nchains = 3,
zInitialize = c("split", "random", "predefined"),
countChecksum = NULL,
zInit = NULL,
logfile = NULL,
verbose = TRUE
)A SingleCellExperiment
with the matrix located in the assay slot under useAssay.
Rows represent features and columns represent cells. Alternatively,
any matrix-like object that can be coerced to a sparse matrix of class
"dgCMatrix" can be directly used as input. The matrix will automatically be
converted to a SingleCellExperiment object.
A string specifying the name of the assay slot to use. Default "counts".
The name for the altExp slot to use. Default "featureSubset".
Vector or factor. Denotes the sample label for each cell (column) in the count matrix.
Integer. Number of cell populations.
Numeric. Concentration parameter for Theta. Adds a pseudocount to each cell population in each sample. Default 1.
Numeric. Concentration parameter for Phi. Adds a pseudocount to each feature in each cell population. Default 1.
String. Algorithm to use for clustering cell subpopulations. One of 'EM' or 'Gibbs'. The EM algorithm is faster, especially for larger numbers of cells. However, more chains may be required to ensure a good solution is found. If 'EM' is selected, then 'stopIter' will be automatically set to 1. Default 'EM'.
Integer. Number of iterations without improvement in the log likelihood to stop inference. Default 10.
Integer. Maximum number of iterations of Gibbs sampling or EM to perform. Default 200.
Integer. On every `splitOnIter` iteration, a heuristic will be applied to determine if a cell population should be reassigned and another cell population should be split into two clusters. To disable splitting, set to -1. Default 10.
Integer. After `stopIter` iterations have been performed without improvement, a heuristic will be applied to determine if a cell population should be reassigned and another cell population should be split into two clusters. If a split occurs, then `stopIter` will be reset. Default TRUE.
Integer. Passed to with_seed. For reproducibility, a default value of 12345 is used. If NULL, no calls to with_seed are made.
Integer. Number of random cluster initializations. Default 3.
Character. One of 'random', 'split', or 'predefined'. With 'random', cells are randomly assigned to a populations. With 'split', cells will be split into sqrt(K) populations and then each population will be subsequently split into another sqrt(K) populations. With 'predefined', values in `zInit` will be used to initialize `z`. Default 'split'.
Character. An MD5 checksum for the `counts` matrix. Default NULL.
Integer vector. Sets initial starting values of z. 'zInit' is only used when `zInitialize = 'predfined'`. Default NULL.
Character. Messages will be redirected to a file named `logfile`. If NULL, messages will be printed to stdout. Default NULL.
Logical. Whether to print log messages. Default TRUE.
A SingleCellExperiment object. Function parameter settings are stored in the metadata
"celda_parameters" slot.
Columns celda_sample_label and celda_cell_cluster in
colData contain sample labels and celda cell population clusters.
celda_G for feature clustering and celda_CG for simultaneous clustering of features and cells. celdaGridSearch can be used to run multiple values of K and multiple chains in parallel.
data(celdaCSim)
sce <- celda_C(celdaCSim$counts,
K = celdaCSim$K,
sampleLabel = celdaCSim$sampleLabel,
nchains = 1)
#> --------------------------------------------------
#> Starting Celda_C: Clustering cells.
#> --------------------------------------------------
#> Tue Apr 2 18:54:46 2024 .. Initializing 'z' in chain 1 with 'split'
#> Tue Apr 2 18:54:46 2024 .... Completed iteration: 1 | logLik: -1282027.27277705
#> Tue Apr 2 18:54:46 2024 .... Completed iteration: 2 | logLik: -1282027.27277705
#> Tue Apr 2 18:54:46 2024 .. Finished chain 1
#> --------------------------------------------------
#> Completed Celda_C. Total time: 0.1405442 secs
#> --------------------------------------------------
Clusters the rows and columns of a count matrix containing
single-cell data into L modules and K subpopulations, respectively. The
useAssay assay slot in
altExpName altExp slot will be used if
it exists. Otherwise, the useAssay
assay slot in x will be used if
x is a SingleCellExperiment object.
celda_CG(
x,
useAssay = "counts",
altExpName = "featureSubset",
sampleLabel = NULL,
K,
L,
alpha = 1,
beta = 1,
delta = 1,
gamma = 1,
algorithm = c("EM", "Gibbs"),
stopIter = 10,
maxIter = 200,
splitOnIter = 10,
splitOnLast = TRUE,
seed = 12345,
nchains = 3,
zInitialize = c("split", "random", "predefined"),
yInitialize = c("split", "random", "predefined"),
countChecksum = NULL,
zInit = NULL,
yInit = NULL,
logfile = NULL,
verbose = TRUE
)
# S4 method for SingleCellExperiment
celda_CG(
x,
useAssay = "counts",
altExpName = "featureSubset",
sampleLabel = NULL,
K,
L,
alpha = 1,
beta = 1,
delta = 1,
gamma = 1,
algorithm = c("EM", "Gibbs"),
stopIter = 10,
maxIter = 200,
splitOnIter = 10,
splitOnLast = TRUE,
seed = 12345,
nchains = 3,
zInitialize = c("split", "random", "predefined"),
yInitialize = c("split", "random", "predefined"),
countChecksum = NULL,
zInit = NULL,
yInit = NULL,
logfile = NULL,
verbose = TRUE
)
# S4 method for ANY
celda_CG(
x,
useAssay = "counts",
altExpName = "featureSubset",
sampleLabel = NULL,
K,
L,
alpha = 1,
beta = 1,
delta = 1,
gamma = 1,
algorithm = c("EM", "Gibbs"),
stopIter = 10,
maxIter = 200,
splitOnIter = 10,
splitOnLast = TRUE,
seed = 12345,
nchains = 3,
zInitialize = c("split", "random", "predefined"),
yInitialize = c("split", "random", "predefined"),
countChecksum = NULL,
zInit = NULL,
yInit = NULL,
logfile = NULL,
verbose = TRUE
)A SingleCellExperiment
with the matrix located in the assay slot under useAssay.
Rows represent features and columns represent cells. Alternatively,
any matrix-like object that can be coerced to a sparse matrix of class
"dgCMatrix" can be directly used as input. The matrix will automatically be
converted to a SingleCellExperiment object.
A string specifying the name of the assay slot to use. Default "counts".
The name for the altExp slot to use. Default "featureSubset".
Vector or factor. Denotes the sample label for each cell (column) in the count matrix.
Integer. Number of cell populations.
Integer. Number of feature modules.
Numeric. Concentration parameter for Theta. Adds a pseudocount to each cell population in each sample. Default 1.
Numeric. Concentration parameter for Phi. Adds a pseudocount to each feature module in each cell population. Default 1.
Numeric. Concentration parameter for Psi. Adds a pseudocount to each feature in each module. Default 1.
Numeric. Concentration parameter for Eta. Adds a pseudocount to the number of features in each module. Default 1.
String. Algorithm to use for clustering cell subpopulations. One of 'EM' or 'Gibbs'. The EM algorithm for cell clustering is faster, especially for larger numbers of cells. However, more chains may be required to ensure a good solution is found. Default 'EM'.
Integer. Number of iterations without improvement in the log likelihood to stop inference. Default 10.
Integer. Maximum number of iterations of Gibbs sampling to perform. Default 200.
Integer. On every splitOnIter iteration,
a heuristic
will be applied to determine if a cell population or feature module should
be reassigned and another cell population or feature module should be split
into two clusters. To disable splitting, set to -1. Default 10.
Integer. After stopIter iterations have been
performed without improvement, a heuristic will be applied to determine if
a cell population or feature module should be reassigned and another cell
population or feature module should be split into two clusters. If a split
occurs, then 'stopIter' will be reset. Default TRUE.
Integer. Passed to with_seed. For reproducibility, a default value of 12345 is used. If NULL, no calls to with_seed are made.
Integer. Number of random cluster initializations. Default 3.
Chararacter. One of 'random', 'split', or 'predefined'.
With 'random', cells are randomly assigned to a populations. With 'split',
cells will be split into sqrt(K) populations and then each population will
be subsequently split into another sqrt(K) populations. With 'predefined',
values in zInit will be used to initialize z. Default 'split'.
Character. One of 'random', 'split', or 'predefined'.
With 'random', features are randomly assigned to a modules. With 'split',
features will be split into sqrt(L) modules and then each module will be
subsequently split into another sqrt(L) modules. With 'predefined', values
in yInit will be used to initialize y. Default 'split'.
Character. An MD5 checksum for the counts matrix. Default NULL.
Integer vector. Sets initial starting values of z. 'zInit' is only used when `zInitialize = 'predfined'`. Default NULL.
Integer vector. Sets initial starting values of y. 'yInit' is only be used when `yInitialize = "predefined"`. Default NULL.
Character. Messages will be redirected to a file named `logfile`. If NULL, messages will be printed to stdout. Default NULL.
Logical. Whether to print log messages. Default TRUE.
A SingleCellExperiment object. Function parameter settings are stored in metadata
"celda_parameters" in altExp slot.
In altExp slot,
columns celda_sample_label and celda_cell_cluster in
colData contain sample labels and celda cell
population clusters. Column celda_feature_module in
rowData contains feature modules.
celda_G for feature clustering and celda_C for clustering cells. celdaGridSearch can be used to run multiple values of K/L and multiple chains in parallel.
data(celdaCGSim)
sce <- celda_CG(celdaCGSim$counts,
K = celdaCGSim$K,
L = celdaCGSim$L,
sampleLabel = celdaCGSim$sampleLabel,
nchains = 1)
#> --------------------------------------------------
#> Starting Celda_CG: Clustering cells and genes.
#> --------------------------------------------------
#> Tue Apr 2 18:54:47 2024 .. Initializing 'z' in chain 1 with 'split'
#> Tue Apr 2 18:54:47 2024 .. Initializing 'y' in chain 1 with 'split'
#> Tue Apr 2 18:54:50 2024 .... Completed iteration: 1 | logLik: -1215542.98684529
#> Tue Apr 2 18:54:50 2024 .... Completed iteration: 2 | logLik: -1215541.0958389
#> Tue Apr 2 18:54:50 2024 .... Completed iteration: 3 | logLik: -1215541.0958389
#> Tue Apr 2 18:54:50 2024 .... Completed iteration: 4 | logLik: -1215542.98684529
#> Tue Apr 2 18:54:50 2024 .... Completed iteration: 5 | logLik: -1215541.0958389
#> Tue Apr 2 18:54:50 2024 .... Completed iteration: 6 | logLik: -1215541.0958389
#> Tue Apr 2 18:54:50 2024 .... Completed iteration: 7 | logLik: -1215541.0958389
#> Tue Apr 2 18:54:50 2024 .... Completed iteration: 8 | logLik: -1215541.0958389
#> Tue Apr 2 18:54:50 2024 .... Completed iteration: 9 | logLik: -1215541.0958389
#> Tue Apr 2 18:54:50 2024 .... Determining if any gene clusters should be split.
#> Tue Apr 2 18:54:51 2024 .... No additional splitting was performed.
#> Tue Apr 2 18:54:51 2024 .... Determining if any cell clusters should be split.
#> Tue Apr 2 18:54:51 2024 .... No additional splitting was performed.
#> Tue Apr 2 18:54:51 2024 .... Completed iteration: 10 | logLik: -1215541.0958389
#> Tue Apr 2 18:54:51 2024 .... Determining if any cell clusters should be split.
#> Tue Apr 2 18:54:51 2024 .... No additional splitting was performed.
#> Tue Apr 2 18:54:51 2024 .... Completed iteration: 11 | logLik: -1215541.0958389
#> Tue Apr 2 18:54:51 2024 .. Finished chain 1
#> --------------------------------------------------
#> Completed Celda_CG. Total time: 3.852616 secs
#> --------------------------------------------------
Clusters the rows of a count matrix containing single-cell data
into L modules. The
useAssay assay slot in
altExpName altExp slot will be used if
it exists. Otherwise, the useAssay
assay slot in x will be used if
x is a SingleCellExperiment object.
celda_G(
x,
useAssay = "counts",
altExpName = "featureSubset",
L,
beta = 1,
delta = 1,
gamma = 1,
stopIter = 10,
maxIter = 200,
splitOnIter = 10,
splitOnLast = TRUE,
seed = 12345,
nchains = 3,
yInitialize = c("split", "random", "predefined"),
countChecksum = NULL,
yInit = NULL,
logfile = NULL,
verbose = TRUE
)
# S4 method for SingleCellExperiment
celda_G(
x,
useAssay = "counts",
altExpName = "featureSubset",
L,
beta = 1,
delta = 1,
gamma = 1,
stopIter = 10,
maxIter = 200,
splitOnIter = 10,
splitOnLast = TRUE,
seed = 12345,
nchains = 3,
yInitialize = c("split", "random", "predefined"),
countChecksum = NULL,
yInit = NULL,
logfile = NULL,
verbose = TRUE
)
# S4 method for ANY
celda_G(
x,
useAssay = "counts",
altExpName = "featureSubset",
L,
beta = 1,
delta = 1,
gamma = 1,
stopIter = 10,
maxIter = 200,
splitOnIter = 10,
splitOnLast = TRUE,
seed = 12345,
nchains = 3,
yInitialize = c("split", "random", "predefined"),
countChecksum = NULL,
yInit = NULL,
logfile = NULL,
verbose = TRUE
)A SingleCellExperiment
with the matrix located in the assay slot under useAssay.
Rows represent features and columns represent cells. Alternatively,
any matrix-like object that can be coerced to a sparse matrix of class
"dgCMatrix" can be directly used as input. The matrix will automatically be
converted to a SingleCellExperiment object.
A string specifying the name of the assay slot to use. Default "counts".
The name for the altExp slot to use. Default "featureSubset".
Integer. Number of feature modules.
Numeric. Concentration parameter for Phi. Adds a pseudocount to each feature module in each cell. Default 1.
Numeric. Concentration parameter for Psi. Adds a pseudocount to each feature in each module. Default 1.
Numeric. Concentration parameter for Eta. Adds a pseudocount to the number of features in each module. Default 1.
Integer. Number of iterations without improvement in the log likelihood to stop inference. Default 10.
Integer. Maximum number of iterations of Gibbs sampling to perform. Default 200.
Integer. On every `splitOnIter` iteration, a heuristic will be applied to determine if a feature module should be reassigned and another feature module should be split into two clusters. To disable splitting, set to -1. Default 10.
Integer. After `stopIter` iterations have been performed without improvement, a heuristic will be applied to determine if a cell population should be reassigned and another cell population should be split into two clusters. If a split occurs, then `stopIter` will be reset. Default TRUE.
Integer. Passed to with_seed. For reproducibility, a default value of 12345 is used. If NULL, no calls to with_seed are made.
Integer. Number of random cluster initializations. Default 3.
Chararacter. One of 'random', 'split', or 'predefined'. With 'random', features are randomly assigned to a modules. With 'split', features will be split into sqrt(L) modules and then each module will be subsequently split into another sqrt(L) modules. With 'predefined', values in `yInit` will be used to initialize `y`. Default 'split'.
Character. An MD5 checksum for the `counts` matrix. Default NULL.
Integer vector. Sets initial starting values of y. `yInit` can only be used when `yInitialize = 'predefined'`. Default NULL.
Character. Messages will be redirected to a file named
logfile. If NULL, messages will be printed to stdout. Default NULL.
Logical. Whether to print log messages. Default TRUE.
A SingleCellExperiment object. Function parameter settings are stored in the metadata
"celda_parameters" slot. Column celda_feature_module in
rowData contains feature modules.
celda_C for cell clustering and celda_CG for simultaneous clustering of features and cells. celdaGridSearch can be used to run multiple values of L and multiple chains in parallel.
data(celdaGSim)
sce <- celda_G(celdaGSim$counts, L = celdaGSim$L, nchains = 1)
#> --------------------------------------------------
#> Starting Celda_G: Clustering genes.
#> --------------------------------------------------
#> Tue Apr 2 18:54:52 2024 .. Initializing 'y' in chain 1 with 'split'
#> Tue Apr 2 18:54:53 2024 .... Completed iteration: 1 | logLik: -290669.046132139
#> Tue Apr 2 18:54:53 2024 .... Completed iteration: 2 | logLik: -290669.046132139
#> Tue Apr 2 18:54:53 2024 .... Completed iteration: 3 | logLik: -290669.046132139
#> Tue Apr 2 18:54:53 2024 .... Completed iteration: 4 | logLik: -290669.046132139
#> Tue Apr 2 18:54:53 2024 .... Completed iteration: 5 | logLik: -290669.046132139
#> Tue Apr 2 18:54:53 2024 .... Completed iteration: 6 | logLik: -290669.046132139
#> Tue Apr 2 18:54:53 2024 .... Completed iteration: 7 | logLik: -290669.046132139
#> Tue Apr 2 18:54:53 2024 .... Completed iteration: 8 | logLik: -290669.046132139
#> Tue Apr 2 18:54:53 2024 .... Completed iteration: 9 | logLik: -290669.046132139
#> Tue Apr 2 18:54:53 2024 .... Determining if any gene clusters should be split.
#> Tue Apr 2 18:54:54 2024 .... No additional splitting was performed.
#> Tue Apr 2 18:54:54 2024 .... Completed iteration: 10 | logLik: -290669.046132139
#> Tue Apr 2 18:54:54 2024 .... Completed iteration: 11 | logLik: -290669.046132139
#> Tue Apr 2 18:54:54 2024 .. Finished chain 1
#> --------------------------------------------------
#> Completed Celda_G. Total time: 2.01088 secs
#> --------------------------------------------------
Convert a old celda model object (celda_C,
celda_G, or celda_CG object) to a
SingleCellExperiment object containing celda model
information in metadata slot. Counts matrix is stored in the
"counts" assay slot in assays.
celdatosce(
celdaModel,
counts,
useAssay = "counts",
altExpName = "featureSubset"
)
# S4 method for celda_C
celdatosce(
celdaModel,
counts,
useAssay = "counts",
altExpName = "featureSubset"
)
# S4 method for celda_G
celdatosce(
celdaModel,
counts,
useAssay = "counts",
altExpName = "featureSubset"
)
# S4 method for celda_CG
celdatosce(
celdaModel,
counts,
useAssay = "counts",
altExpName = "featureSubset"
)
# S4 method for celdaList
celdatosce(
celdaModel,
counts,
useAssay = "counts",
altExpName = "featureSubset"
)A celdaModel or celdaList object generated
using older versions of celda.
A numeric matrix of counts used to generate
celdaModel. Dimensions and MD5 checksum will be checked by
compareCountMatrix.
A string specifying the name of the assay slot to use. Default "counts".
The name for the altExp slot to use. Default "featureSubset".
A SingleCellExperiment object. Function parameter settings are stored in the metadata
"celda_parameters" slot.
Columns celda_sample_label and celda_cell_cluster in
colData contain sample labels and celda cell
population clusters. Column celda_feature_module in
rowData contain feature modules.
data(celdaCMod, celdaCSim)
sce <- celdatosce(celdaCMod, celdaCSim$counts)
data(celdaGMod, celdaGSim)
sce <- celdatosce(celdaGMod, celdaGSim$counts)
data(celdaCGMod, celdaCGSim)
sce <- celdatosce(celdaCGMod, celdaCGSim$counts)
data(celdaCGGridSearchRes, celdaCGSim)
sce <- celdatosce(celdaCGGridSearchRes, celdaCGSim$counts)
R/clusterProbability.R
clusterProbability.RdCalculate the conditional probability of each cell belonging to each subpopulation given all other cell cluster assignments and/or each feature belonging to each module given all other feature cluster assignments in a celda model.
clusterProbability(
sce,
useAssay = "counts",
altExpName = "featureSubset",
log = FALSE
)
# S4 method for SingleCellExperiment
clusterProbability(
sce,
useAssay = "counts",
altExpName = "featureSubset",
log = FALSE
)A SingleCellExperiment object returned by
celda_C, celda_G, or celda_CG, with the matrix
located in the useAssay assay slot.
Rows represent features and columns represent cells.
A string specifying which assay slot to use. Default "counts".
The name for the altExp slot to use. Default "featureSubset".
Logical. If FALSE, then the normalized conditional
probabilities will be returned. If TRUE, then the unnormalized log
probabilities will be returned. Default FALSE.
A list containging a matrix for the conditional cell subpopulation cluster and/or feature module probabilities.
`celda_C()` for clustering cells
Checks if the counts matrix is the same one used to generate the celda model object by comparing dimensions and MD5 checksum.
compareCountMatrix(counts, celdaMod, errorOnMismatch = TRUE)
# S4 method for ANY,celdaModel
compareCountMatrix(counts, celdaMod, errorOnMismatch = TRUE)
# S4 method for ANY,celdaList
compareCountMatrix(counts, celdaMod, errorOnMismatch = TRUE)Integer , Numeric, or Sparse matrix. Rows represent features and columns represent cells.
A celdaModel or celdaList object.
Logical. Whether to throw an error in the event of a mismatch. Default TRUE.
Returns TRUE if provided count matrix matches the one used in the
celda object and/or errorOnMismatch = FALSE, FALSE otherwise.
R/accessors.R
countChecksum-celdaList-method.RdReturns the MD5 hash of the count matrix used to generate the celdaList.
# S4 method for celdaList
countChecksum(celdaList)An object of class celdaList.
A character string of length 32 containing the MD5 digest of the count matrix.
data(celdaCGGridSearchRes)
countChecksum <- countChecksum(celdaCGGridSearchRes)
Returns the MD5 hash of the count matrix used to generate the celdaList.
countChecksum(celdaList)An object of class celdaList.
A character string of length 32 containing the MD5 digest of the count matrix.
data(celdaCGGridSearchRes)
countChecksum <- countChecksum(celdaCGGridSearchRes)
Identifies contamination from factors such as ambient RNA in single cell genomic datasets.
decontX(x, ...)
# S4 method for SingleCellExperiment
decontX(
x,
assayName = "counts",
z = NULL,
batch = NULL,
background = NULL,
bgAssayName = NULL,
bgBatch = NULL,
maxIter = 500,
delta = c(10, 10),
estimateDelta = TRUE,
convergence = 0.001,
iterLogLik = 10,
varGenes = 5000,
dbscanEps = 1,
seed = 12345,
logfile = NULL,
verbose = TRUE
)
# S4 method for ANY
decontX(
x,
z = NULL,
batch = NULL,
background = NULL,
bgBatch = NULL,
maxIter = 500,
delta = c(10, 10),
estimateDelta = TRUE,
convergence = 0.001,
iterLogLik = 10,
varGenes = 5000,
dbscanEps = 1,
seed = 12345,
logfile = NULL,
verbose = TRUE
)A numeric matrix of counts or a SingleCellExperiment
with the matrix located in the assay slot under assayName.
Cells in each batch will be subsetted and converted to a sparse matrix
of class dgCMatrix from package Matrix before analysis. This
object should only contain filtered cells after cell calling. Empty
cell barcodes (low expression droplets before cell calling) are not needed
to run DecontX.
For the generic, further arguments to pass to each method.
Character. Name of the assay to use if x is a
SingleCellExperiment.
Numeric or character vector. Cell cluster labels. If NULL, PCA will be used to reduce the dimensionality of the dataset initially, 'umap' from the 'uwot' package will be used to further reduce the dataset to 2 dimenions and the 'dbscan' function from the 'dbscan' package will be used to identify clusters of broad cell types. Default NULL.
Numeric or character vector. Batch labels for cells. If batch labels are supplied, DecontX is run on cells from each batch separately. Cells run in different channels or assays should be considered different batches. Default NULL.
A numeric matrix of counts or a
SingleCellExperiment with the matrix located in the assay
slot under assayName. It should have the same data format as x
except it contains the empty droplets instead of cells. When supplied,
empirical distribution of transcripts from these empty droplets
will be used as the contamination distribution. Default NULL.
Character. Name of the assay to use if background
is a SingleCellExperiment. Default to same as
assayName.
Numeric or character vector. Batch labels for
background. Its unique values should be the same as those in
batch, such that each batch of cells have their corresponding batch
of empty droplets as background, pointed by this parameter. Default to NULL.
Integer. Maximum iterations of the EM algorithm. Default 500.
Numeric Vector of length 2. Concentration parameters for
the Dirichlet prior for the contamination in each cell. The first element
is the prior for the native counts while the second element is the prior for
the contamination counts. These essentially act as pseudocounts for the
native and contamination in each cell. If estimateDelta = TRUE,
this is only used to produce a random sample of proportions for an initial
value of contamination in each cell. Then
fit_dirichlet is used to update
delta in each iteration.
If estimateDelta = FALSE, then delta is fixed with these
values for the entire inference procedure. Fixing delta and
setting a high number in the second element will force decontX
to be more aggressive and estimate higher levels of contamination at
the expense of potentially removing native expression.
Default c(10, 10).
Boolean. Whether to update delta at each
iteration.
Numeric. The EM algorithm will be stopped if the maximum difference in the contamination estimates between the previous and current iterations is less than this. Default 0.001.
Integer. Calculate log likelihood every iterLogLik
iteration. Default 10.
Integer. The number of variable genes to use in
dimensionality reduction before clustering. Variability is calcualted using
modelGeneVar function from the 'scran' package.
Used only when z is not provided. Default 5000.
Numeric. The clustering resolution parameter used in 'dbscan' to estimate broad cell clusters. Used only when z is not provided. Default 1.
Integer. Passed to with_seed. For reproducibility, a default value of 12345 is used. If NULL, no calls to with_seed are made.
Character. Messages will be redirected to a file named `logfile`. If NULL, messages will be printed to stdout. Default NULL.
Logical. Whether to print log messages. Default TRUE.
If x is a matrix-like object, a list will be returned
with the following items:
decontXcounts:The decontaminated matrix. Values obtained
from the variational inference procedure may be non-integer. However,
integer counts can be obtained by rounding,
e.g. round(decontXcounts).
contamination:Percentage of contamination in each cell.
estimates:List of estimated parameters for each batch. If z was not supplied, then the UMAP coordinates used to generated cell cluster labels will also be stored here.
z:Cell population/cluster labels used for analysis.
runParams:List of arguments used in the function call.
If x is a SingleCellExperiment, then the decontaminated
counts will be stored as an assay and can be accessed with
decontXcounts(x). The contamination values and cluster labels
will be stored in colData(x). estimates and runParams
will be stored in metadata(x)$decontX. The UMAPs used to generated
cell cluster labels will be stored in
reducedDims slot in x.
# Generate matrix with contamination
s <- simulateContamination(seed = 12345)
library(SingleCellExperiment)
sce <- SingleCellExperiment(list(counts = s$observedCounts))
sce <- decontX(sce)
#> --------------------------------------------------
#> Starting DecontX
#> --------------------------------------------------
#> Tue Apr 2 18:54:58 2024 .. Analyzing all cells
#> Tue Apr 2 18:54:58 2024 .... Converting to sparse matrix
#> Tue Apr 2 18:54:58 2024 .... Generating UMAP and estimating cell types
#> Tue Apr 2 18:55:02 2024 .... Estimating contamination
#> Tue Apr 2 18:55:02 2024 ...... Completed iteration: 9 | converge: 0.0009154
#> Tue Apr 2 18:55:02 2024 .. Calculating final decontaminated matrix
#> --------------------------------------------------
#> Completed DecontX. Total time: 4.075659 secs
#> --------------------------------------------------
# Plot contamination on UMAP
plotDecontXContamination(sce)
# Plot decontX cluster labels
umap <- reducedDim(sce)
plotDimReduceCluster(x = sce$decontX_clusters,
dim1 = umap[, 1], dim2 = umap[, 2], )
# Plot percentage of marker genes detected
# in each cell cluster before decontamination
s$markers
#> $CellType_1_Markers
#> [1] "Gene_47" "Gene_32" "Gene_86"
#>
#> $CellType_2_Markers
#> [1] "Gene_70" "Gene_33" "Gene_48"
#>
#> $CellType_3_Markers
#> [1] "Gene_74" "Gene_26" "Gene_20"
#>
plotDecontXMarkerPercentage(sce, markers = s$markers, assayName = "counts")
# Plot percentage of marker genes detected
# in each cell cluster after contamination
plotDecontXMarkerPercentage(sce, markers = s$markers,
assayName = "decontXcounts")
# Plot percentage of marker genes detected in each cell
# comparing original and decontaminated counts side-by-side
plotDecontXMarkerPercentage(sce, markers = s$markers,
assayName = c("counts", "decontXcounts"))
# Plot raw counts of indiviual markers genes before
# and after decontamination
plotDecontXMarkerExpression(sce, unlist(s$markers))
Gets or sets the decontaminated counts matrix from a a SingleCellExperiment object.
decontXcounts(object, ...)
decontXcounts(object, ...) <- value
# S4 method for SingleCellExperiment
decontXcounts(object, ...)
# S4 method for SingleCellExperiment
decontXcounts(object, ...) <- valueA SingleCellExperiment object.
For the generic, further arguments to pass to each method.
A matrix to save as an assay called decontXcounts
If getting, the assay from object with the name
decontXcounts will be returned. If setting, a
SingleCellExperiment object will be returned with
decontXcounts listed in the assay slot.
assay and assay<-
Generate a palette of `n` distinct colors.
Integer. Number of colors to generate.
Character vector. Colors available from `colors()`. These will be used as the base colors for the clustering scheme in HSV. Different saturations and values will be generated for each hue. Default c("red", "cyan", "orange", "blue", "yellow", "purple", "green", "magenta").
Numeric vector. A vector of length 2 denoting the saturation for HSV. Values must be in [0,1]. Default: c(0.25, 1).
Numeric vector. A vector of length 2 denoting the range of values for HSV. Values must be in [0,1]. Default: `c(0.5, 1)`.
A vector of distinct colors that have been converted to HEX from HSV.
colorPal <- distinctColors(6) # can be used in plotting functions
Fast matrix multiplication for double x int
eigenMatMultInt(A, B)a double matrix
an integer matrix
An integer matrix representing the product of A and B
Fast matrix multiplication for double x double
eigenMatMultNumeric(A, B)a double matrix
an integer matrix
An integer matrix representing the product of A and B
R/factorizeMatrix.R
factorizeMatrix.RdGenerates factorized matrices showing the contribution of each feature in each cell population or each cell population in each sample.
factorizeMatrix(
x,
celdaMod,
useAssay = "counts",
altExpName = "featureSubset",
type = c("counts", "proportion", "posterior")
)
# S4 method for SingleCellExperiment,ANY
factorizeMatrix(
x,
useAssay = "counts",
altExpName = "featureSubset",
type = c("counts", "proportion", "posterior")
)
# S4 method for ANY,celda_CG
factorizeMatrix(x, celdaMod, type = c("counts", "proportion", "posterior"))
# S4 method for ANY,celda_C
factorizeMatrix(x, celdaMod, type = c("counts", "proportion", "posterior"))
# S4 method for ANY,celda_G
factorizeMatrix(x, celdaMod, type = c("counts", "proportion", "posterior"))Can be one of
A SingleCellExperiment object returned by
celda_C, celda_G or celda_CG, with the matrix
located in the useAssay assay slot in altExp(x, altExpName).
Rows represent features and columns represent cells.
Integer counts matrix. Rows represent features and columns represent
cells. This matrix should be the same as the one used to generate
celdaMod.
Celda model object. Only works if x is an integer
counts matrix.
A string specifying which assay
slot to use if x is a SingleCellExperiment object.
Default "counts".
The name for the altExp slot to use. Default "featureSubset".
Character vector. A vector containing one or more of "counts",
"proportion", or "posterior". "counts" returns the raw number of counts for
each factorized matrix. "proportions" returns the normalized probabilities
for each factorized matrix, which are calculated by dividing the raw counts
in each factorized matrix by the total counts in each column. "posterior"
returns the posterior estimates which include the addition of the Dirichlet
concentration parameter (essentially as a pseudocount). Default
"counts".
For celda_CG model, A list with elements for "counts", "proportions", or "posterior" probabilities. Each element will be a list containing factorized matrices for "module", "cellPopulation", and "sample". Additionally, the contribution of each module in each individual cell will be included in the "cell" element of "counts" and "proportions" elements.
For celda_C model, a list with elements for "counts", "proportions", or "posterior" probabilities. Each element will be a list containing factorized matrices for "module" and "sample".
For celda_G model, a list with elements for "counts", "proportions", or "posterior" probabilities. Each element will be a list containing factorized matrices for "module" and "cell".
data(sceCeldaCG)
factorizedMatrices <- factorizeMatrix(sceCeldaCG, type = "posterior")
data(celdaCGSim, celdaCGMod)
factorizedMatrices <- factorizeMatrix(
celdaCGSim$counts,
celdaCGMod,
"posterior")
data(celdaCSim, celdaCMod)
factorizedMatrices <- factorizeMatrix(
celdaCSim$counts,
celdaCMod, "posterior"
)
data(celdaGSim, celdaGMod)
factorizedMatrices <- factorizeMatrix(
celdaGSim$counts,
celdaGMod, "posterior"
)
Fast normalization for numeric matrix
fastNormProp(R_counts, R_alpha)An integer matrix
A double value to be added to the matrix as a pseudocount
A numeric matrix where the columns have been normalized to proportions
Fast normalization for numeric matrix
fastNormPropLog(R_counts, R_alpha)An integer matrix
A double value to be added to the matrix as a pseudocount
A numeric matrix where the columns have been normalized to proportions
Fast normalization for numeric matrix
fastNormPropSqrt(R_counts, R_alpha)An integer matrix
A double value to be added to the matrix as a pseudocount
A numeric matrix where the columns have been normalized to proportions
This function will output the corresponding feature module for
a specified vector of genes from a celda_CG or celda_G celdaModel.
features must match the rownames of sce.
featureModuleLookup(
sce,
features,
altExpName = "featureSubset",
exactMatch = TRUE,
by = "rownames"
)
# S4 method for SingleCellExperiment
featureModuleLookup(
sce,
features,
altExpName = "featureSubset",
exactMatch = TRUE,
by = "rownames"
)A SingleCellExperiment object returned by
celda_G, or celda_CG, with the matrix
located in the useAssay assay slot.
Rows represent features and columns represent cells.
Character vector. Identify feature modules for the specified
feature names. feature must match the rownames of sce.
The name for the altExp slot to use. Default "featureSubset".
Logical. Whether to look for exactMatch of the gene name
within counts matrix. Default TRUE.
Character. Where to search for features in the sce object.
If set to "rownames" then the features will be searched for among
rownames(sce). This can also be set to one of the colnames of
rowData(sce). Default "rownames".
Numeric vector containing the module numbers for each feature. If
the feature was not found, then an NA value will be returned in that
position. If no features were found, then an error will be given.
Creates a table that contains the list of features in each feature module.
featureModuleTable(
sce,
useAssay = "counts",
altExpName = "featureSubset",
displayName = NULL,
outputFile = NULL
)A SingleCellExperiment object returned by
celda_G, or celda_CG, with the matrix
located in the useAssay assay slot.
Rows represent features and columns represent cells.
A string specifying which assay slot to use. Default "counts".
The name for the altExp slot to use. Default "featureSubset".
Character. The column name of
rowData(sce) that specifies the display names for
the features. Default NULL, which displays the row names.
File name for feature module table. If NULL, file will not be created. Default NULL.
Matrix. Contains a list of features per each column (feature module)
data(sceCeldaCG)
featureModuleTable(sceCeldaCG)
#> L1 L2 L3 L4 L5 L6 L7
#> [1,] "Gene_45" "Gene_77" "Gene_72" "Gene_87" "Gene_41" "Gene_80" "Gene_24"
#> [2,] "Gene_5" "Gene_17" "Gene_15" "Gene_6" "Gene_70" "Gene_60" "Gene_53"
#> [3,] "Gene_94" "Gene_30" "Gene_22" "Gene_82" "Gene_64" "Gene_55" "Gene_74"
#> [4,] "Gene_91" "Gene_9" "Gene_52" "Gene_28" "Gene_36" "Gene_40" "Gene_59"
#> [5,] "Gene_3" "" "Gene_78" "Gene_39" "Gene_37" "Gene_25" "Gene_89"
#> [6,] "Gene_50" "" "Gene_84" "Gene_10" "Gene_14" "Gene_62" "Gene_11"
#> [7,] "Gene_90" "" "Gene_81" "Gene_51" "Gene_99" "Gene_97" "Gene_49"
#> [8,] "" "" "" "Gene_100" "Gene_20" "Gene_95" "Gene_1"
#> [9,] "" "" "" "Gene_65" "Gene_2" "Gene_46" "Gene_79"
#> [10,] "" "" "" "Gene_33" "Gene_44" "Gene_76" ""
#> [11,] "" "" "" "Gene_38" "Gene_66" "" ""
#> [12,] "" "" "" "" "Gene_32" "" ""
#> [13,] "" "" "" "" "Gene_34" "" ""
#> [14,] "" "" "" "" "Gene_23" "" ""
#> [15,] "" "" "" "" "" "" ""
#> [16,] "" "" "" "" "" "" ""
#> L8 L9 L10
#> [1,] "Gene_4" "Gene_85" "Gene_69"
#> [2,] "Gene_43" "Gene_98" "Gene_47"
#> [3,] "Gene_42" "Gene_35" "Gene_13"
#> [4,] "Gene_29" "Gene_8" "Gene_92"
#> [5,] "Gene_73" "Gene_83" "Gene_88"
#> [6,] "Gene_61" "Gene_19" "Gene_93"
#> [7,] "" "Gene_63" "Gene_12"
#> [8,] "" "Gene_75" "Gene_86"
#> [9,] "" "Gene_21" "Gene_58"
#> [10,] "" "Gene_48" "Gene_18"
#> [11,] "" "Gene_31" "Gene_7"
#> [12,] "" "Gene_57" "Gene_16"
#> [13,] "" "Gene_27" ""
#> [14,] "" "Gene_67" ""
#> [15,] "" "Gene_68" ""
#> [16,] "" "Gene_54" ""
Identify and return significantly-enriched terms for each gene module in a Celda object or a SingleCellExperiment object. Performs gene set enrichment analysis for Celda identified modules using the enrichr.
geneSetEnrich(
x,
celdaModel,
useAssay = "counts",
altExpName = "featureSubset",
databases,
fdr = 0.05
)
# S4 method for SingleCellExperiment
geneSetEnrich(
x,
useAssay = "counts",
altExpName = "featureSubset",
databases,
fdr = 0.05
)
# S4 method for matrix
geneSetEnrich(x, celdaModel, databases, fdr = 0.05)A numeric matrix of counts or a
SingleCellExperiment
with the matrix located in the assay slot under useAssay.
Rows represent features and columns represent cells. Rownames of the
matrix or SingleCellExperiment object should be gene names.
Celda object of class celda_G or celda_CG.
A string specifying which assay
slot to use if x is a
SingleCellExperiment object. Default "counts".
The name for the altExp slot to use. Default "featureSubset".
Character vector. Name of reference database. Available databases can be viewed by listEnrichrDbs.
False discovery rate (FDR). Numeric. Cutoff value for adjusted p-value, terms with FDR below this value are considered significantly enriched.
List of length 'L' where each member contains the significantly enriched terms for the corresponding module.
library(M3DExampleData)
counts <- M3DExampleData::Mmus_example_list$data
# subset 500 genes for fast clustering
counts <- counts[seq(1501, 2000), ]
# cluster genes into 10 modules for quick demo
sce <- celda_G(x = as.matrix(counts), L = 10, verbose = FALSE)
gse <- geneSetEnrich(sce,
databases = c("GO_Biological_Process_2018", "GO_Molecular_Function_2018"))
#> Error in handle_url(handle, url, ...): Must specify at least one of url or handle
Primary celda functionsFunctions for clustering of cells |
|
|---|---|
Cell and feature clustering with Celda |
|
Cell clustering with Celda |
|
Feature clustering with Celda |
|
Generate an HTML report for celda_CG |
|
Simple feature selection by feature counts |
|
Split celda feature module |
|
Visualization functions for celda resultsFunctions for displaying celda resuls on 2-D embeddings, heatmaps, and violin plots |
|
Uniform Manifold Approximation and Projection (UMAP) dimension
reduction for celda |
|
t-Distributed Stochastic Neighbor Embedding (t-SNE) dimension
reduction for celda |
|
Heatmap for featureModules |
|
Probability map for a celda model |
|
Plotting the cell labels on a dimension reduction plot |
|
Plotting feature expression on a dimension reduction plot |
|
Plotting Celda module probability on a dimension reduction plot |
|
Mapping the dimension reduction plot |
|
Feature Expression Violin Plot |
|
Plot celda Heatmap |
|
Primary decontX functionsFunctions for estimating and displaying contamination with decontX |
|
Contamination estimation with decontX |
|
Plots contamination on UMAP coordinates |
|
Plots expression of marker genes before and after decontamination |
|
Plots percentage of cells cell types expressing markers |
|
Get or set decontaminated counts matrix |
|
Functions for determining the numbers of clusters in celdaFunctions for running and comparing multiple celda models with different number of modules or cell populations |
|
Recursive cell splitting |
|
Recursive module splitting |
|
Visualize perplexity differences of a list of celda models |
|
Run Celda in parallel with multiple parameters |
|
Visualize perplexity of a list of celda models |
|
Calculate the perplexity of a celda model |
|
Calculate and visualize perplexity of all models in a celdaList |
|
Select best chain within each combination of parameters |
|
Get final celdaModels from a celda model |
|
Subset celda model from SCE object returned from
|
|
Append two celdaList objects |
|
Get perplexity for every model in a celdaList |
|
Miscellaneous celda functionsVarious functions for manipulation of celda results |
|
Get or set the cell cluster labels from a celda SingleCellExperiment object or celda model object. |
|
Get or set the feature module labels from a celda SingleCellExperiment object. |
|
Recode feature module labels |
|
Recode cell cluster labels |
|
Reorder cells populations and/or features modules using hierarchical clustering |
|
Obtain the gene module of a gene of interest |
|
Output a feature module table |
|
Celda models |
|
Get parameter values provided for celdaModel creation |
|
Get run parameters from a celda model
|
|
Generate factorized matrices showing each feature's influence on cell / gene clustering |
|
Get the log-likelihood |
|
Get the conditional probabilities of cell in subpopulations from celda model |
|
Gene set enrichment |
|
Plots heatmap based on Celda model |
|
Retrieve row index for a set of features |
|
Normalization of count data |
|
Create a color palette |
|
Get feature, cell and sample names from a celdaModel |
|
Calculate the Log-likelihood of a celda model |
|
Get log-likelihood history |
|
Identify features with the highest influence on clustering. |
|
Get or set sample labels from a celda SingleCellExperiment object |
|
Simulation functionsFunctions for generating data from the generative process of each model |
|
Simulate count data from the celda generative models. |
|
Simulate contaminated count matrix |
|
Data objectsSmall data objects used in examples |
|
sceCeldaCG |
|
sceCeldaC |
|
sceCeldaG |
|
sceCeldaCGGridSearch |
|
celdaCGGridSearchRes |
|
sampleCells |
|
contaminationSim |
|
Calculate the log-likelihood for cell population and feature module cluster assignments on the count matrix, per celda model.
logLikelihood(x, celdaMod, useAssay = "counts", altExpName = "featureSubset")
# S4 method for SingleCellExperiment,ANY
logLikelihood(x, useAssay = "counts", altExpName = "featureSubset")
# S4 method for matrix,celda_C
logLikelihood(x, celdaMod)
# S4 method for matrix,celda_G
logLikelihood(x, celdaMod)
# S4 method for matrix,celda_CG
logLikelihood(x, celdaMod)A SingleCellExperiment object returned by
celda_C, celda_G, or celda_CG, with the matrix
located in the useAssay assay slot.
Rows represent features and columns represent cells.
celda model object. Ignored if x is a
SingleCellExperiment object.
A string specifying which assay slot to use. Default "counts".
The name for the altExp slot to use. Default "featureSubset".
The log-likelihood of the cluster assignment for the provided SingleCellExperiment.
`celda_C()` for clustering cells
data(sceCeldaC, sceCeldaCG)
loglikC <- logLikelihood(sceCeldaC)
loglikCG <- logLikelihood(sceCeldaCG)
Retrieves the complete log-likelihood from all iterations of Gibbs sampling used to generate a celda model.
logLikelihoodHistory(x, altExpName = "featureSubset")
# S4 method for SingleCellExperiment
logLikelihoodHistory(x, altExpName = "featureSubset")
# S4 method for celdaModel
logLikelihoodHistory(x)A SingleCellExperiment object returned by celda_C, celda_G, or celda_CG, or a celda model object.
The name for the altExp slot to use. Default "featureSubset".
Numeric. The log-likelihood at each step of Gibbs sampling used to generate the model.
data(sceCeldaCG)
logLikelihoodHistory(sceCeldaCG)
#> [1] -1212891 -1212891 -1212891 -1212891 -1212891 -1212891 -1212891 -1212891
#> [9] -1212891 -1212891 -1212891 -1212891
data(celdaCGMod)
logLikelihoodHistory(celdaCGMod)
#> [1] -1215541 -1215541 -1215541 -1215541 -1215541 -1215541 -1215541 -1215541
#> [9] -1215541 -1215541 -1215541 -1215541
Retrieves the row, column, and sample names used to generate a celdaModel.
matrixNames(celdaMod)
# S4 method for celdaModel
matrixNames(celdaMod)celdaModel. Options available in `celda::availableModels`.
List. Contains row, column, and sample character vectors corresponding to the values provided when the celdaModel was generated.
data(celdaCGMod)
matrixNames(celdaCGMod)
#> $row
#> [1] "Gene_1" "Gene_2" "Gene_3" "Gene_4" "Gene_5" "Gene_6"
#> [7] "Gene_7" "Gene_8" "Gene_9" "Gene_10" "Gene_11" "Gene_12"
#> [13] "Gene_13" "Gene_14" "Gene_15" "Gene_16" "Gene_17" "Gene_18"
#> [19] "Gene_19" "Gene_20" "Gene_21" "Gene_22" "Gene_23" "Gene_24"
#> [25] "Gene_25" "Gene_26" "Gene_27" "Gene_28" "Gene_29" "Gene_30"
#> [31] "Gene_31" "Gene_32" "Gene_33" "Gene_34" "Gene_35" "Gene_36"
#> [37] "Gene_37" "Gene_38" "Gene_39" "Gene_40" "Gene_41" "Gene_42"
#> [43] "Gene_43" "Gene_44" "Gene_45" "Gene_46" "Gene_47" "Gene_48"
#> [49] "Gene_49" "Gene_50" "Gene_51" "Gene_52" "Gene_53" "Gene_54"
#> [55] "Gene_55" "Gene_56" "Gene_57" "Gene_58" "Gene_59" "Gene_60"
#> [61] "Gene_61" "Gene_62" "Gene_63" "Gene_64" "Gene_65" "Gene_66"
#> [67] "Gene_67" "Gene_68" "Gene_69" "Gene_70" "Gene_71" "Gene_72"
#> [73] "Gene_73" "Gene_74" "Gene_75" "Gene_76" "Gene_77" "Gene_78"
#> [79] "Gene_79" "Gene_80" "Gene_81" "Gene_82" "Gene_83" "Gene_84"
#> [85] "Gene_85" "Gene_86" "Gene_87" "Gene_88" "Gene_89" "Gene_90"
#> [91] "Gene_91" "Gene_92" "Gene_93" "Gene_94" "Gene_95" "Gene_96"
#> [97] "Gene_97" "Gene_98" "Gene_99" "Gene_100"
#>
#> $column
#> [1] "Cell_1" "Cell_2" "Cell_3" "Cell_4" "Cell_5" "Cell_6"
#> [7] "Cell_7" "Cell_8" "Cell_9" "Cell_10" "Cell_11" "Cell_12"
#> [13] "Cell_13" "Cell_14" "Cell_15" "Cell_16" "Cell_17" "Cell_18"
#> [19] "Cell_19" "Cell_20" "Cell_21" "Cell_22" "Cell_23" "Cell_24"
#> [25] "Cell_25" "Cell_26" "Cell_27" "Cell_28" "Cell_29" "Cell_30"
#> [31] "Cell_31" "Cell_32" "Cell_33" "Cell_34" "Cell_35" "Cell_36"
#> [37] "Cell_37" "Cell_38" "Cell_39" "Cell_40" "Cell_41" "Cell_42"
#> [43] "Cell_43" "Cell_44" "Cell_45" "Cell_46" "Cell_47" "Cell_48"
#> [49] "Cell_49" "Cell_50" "Cell_51" "Cell_52" "Cell_53" "Cell_54"
#> [55] "Cell_55" "Cell_56" "Cell_57" "Cell_58" "Cell_59" "Cell_60"
#> [61] "Cell_61" "Cell_62" "Cell_63" "Cell_64" "Cell_65" "Cell_66"
#> [67] "Cell_67" "Cell_68" "Cell_69" "Cell_70" "Cell_71" "Cell_72"
#> [73] "Cell_73" "Cell_74" "Cell_75" "Cell_76" "Cell_77" "Cell_78"
#> [79] "Cell_79" "Cell_80" "Cell_81" "Cell_82" "Cell_83" "Cell_84"
#> [85] "Cell_85" "Cell_86" "Cell_87" "Cell_88" "Cell_89" "Cell_90"
#> [91] "Cell_91" "Cell_92" "Cell_93" "Cell_94" "Cell_95" "Cell_96"
#> [97] "Cell_97" "Cell_98" "Cell_99" "Cell_100" "Cell_101" "Cell_102"
#> [103] "Cell_103" "Cell_104" "Cell_105" "Cell_106" "Cell_107" "Cell_108"
#> [109] "Cell_109" "Cell_110" "Cell_111" "Cell_112" "Cell_113" "Cell_114"
#> [115] "Cell_115" "Cell_116" "Cell_117" "Cell_118" "Cell_119" "Cell_120"
#> [121] "Cell_121" "Cell_122" "Cell_123" "Cell_124" "Cell_125" "Cell_126"
#> [127] "Cell_127" "Cell_128" "Cell_129" "Cell_130" "Cell_131" "Cell_132"
#> [133] "Cell_133" "Cell_134" "Cell_135" "Cell_136" "Cell_137" "Cell_138"
#> [139] "Cell_139" "Cell_140" "Cell_141" "Cell_142" "Cell_143" "Cell_144"
#> [145] "Cell_145" "Cell_146" "Cell_147" "Cell_148" "Cell_149" "Cell_150"
#> [151] "Cell_151" "Cell_152" "Cell_153" "Cell_154" "Cell_155" "Cell_156"
#> [157] "Cell_157" "Cell_158" "Cell_159" "Cell_160" "Cell_161" "Cell_162"
#> [163] "Cell_163" "Cell_164" "Cell_165" "Cell_166" "Cell_167" "Cell_168"
#> [169] "Cell_169" "Cell_170" "Cell_171" "Cell_172" "Cell_173" "Cell_174"
#> [175] "Cell_175" "Cell_176" "Cell_177" "Cell_178" "Cell_179" "Cell_180"
#> [181] "Cell_181" "Cell_182" "Cell_183" "Cell_184" "Cell_185" "Cell_186"
#> [187] "Cell_187" "Cell_188" "Cell_189" "Cell_190" "Cell_191" "Cell_192"
#> [193] "Cell_193" "Cell_194" "Cell_195" "Cell_196" "Cell_197" "Cell_198"
#> [199] "Cell_199" "Cell_200" "Cell_201" "Cell_202" "Cell_203" "Cell_204"
#> [205] "Cell_205" "Cell_206" "Cell_207" "Cell_208" "Cell_209" "Cell_210"
#> [211] "Cell_211" "Cell_212" "Cell_213" "Cell_214" "Cell_215" "Cell_216"
#> [217] "Cell_217" "Cell_218" "Cell_219" "Cell_220" "Cell_221" "Cell_222"
#> [223] "Cell_223" "Cell_224" "Cell_225" "Cell_226" "Cell_227" "Cell_228"
#> [229] "Cell_229" "Cell_230" "Cell_231" "Cell_232" "Cell_233" "Cell_234"
#> [235] "Cell_235" "Cell_236" "Cell_237" "Cell_238" "Cell_239" "Cell_240"
#> [241] "Cell_241" "Cell_242" "Cell_243" "Cell_244" "Cell_245" "Cell_246"
#> [247] "Cell_247" "Cell_248" "Cell_249" "Cell_250" "Cell_251" "Cell_252"
#> [253] "Cell_253" "Cell_254" "Cell_255" "Cell_256" "Cell_257" "Cell_258"
#> [259] "Cell_259" "Cell_260" "Cell_261" "Cell_262" "Cell_263" "Cell_264"
#> [265] "Cell_265" "Cell_266" "Cell_267" "Cell_268" "Cell_269" "Cell_270"
#> [271] "Cell_271" "Cell_272" "Cell_273" "Cell_274" "Cell_275" "Cell_276"
#> [277] "Cell_277" "Cell_278" "Cell_279" "Cell_280" "Cell_281" "Cell_282"
#> [283] "Cell_283" "Cell_284" "Cell_285" "Cell_286" "Cell_287" "Cell_288"
#> [289] "Cell_289" "Cell_290" "Cell_291" "Cell_292" "Cell_293" "Cell_294"
#> [295] "Cell_295" "Cell_296" "Cell_297" "Cell_298" "Cell_299" "Cell_300"
#> [301] "Cell_301" "Cell_302" "Cell_303" "Cell_304" "Cell_305" "Cell_306"
#> [307] "Cell_307" "Cell_308" "Cell_309" "Cell_310" "Cell_311" "Cell_312"
#> [313] "Cell_313" "Cell_314" "Cell_315" "Cell_316" "Cell_317" "Cell_318"
#> [319] "Cell_319" "Cell_320" "Cell_321" "Cell_322" "Cell_323" "Cell_324"
#> [325] "Cell_325" "Cell_326" "Cell_327" "Cell_328" "Cell_329" "Cell_330"
#> [331] "Cell_331" "Cell_332" "Cell_333" "Cell_334" "Cell_335" "Cell_336"
#> [337] "Cell_337" "Cell_338" "Cell_339" "Cell_340" "Cell_341" "Cell_342"
#> [343] "Cell_343" "Cell_344" "Cell_345" "Cell_346" "Cell_347" "Cell_348"
#> [349] "Cell_349" "Cell_350" "Cell_351" "Cell_352" "Cell_353" "Cell_354"
#> [355] "Cell_355" "Cell_356" "Cell_357" "Cell_358" "Cell_359" "Cell_360"
#> [361] "Cell_361" "Cell_362" "Cell_363" "Cell_364" "Cell_365" "Cell_366"
#> [367] "Cell_367" "Cell_368" "Cell_369" "Cell_370" "Cell_371" "Cell_372"
#> [373] "Cell_373" "Cell_374" "Cell_375" "Cell_376" "Cell_377" "Cell_378"
#> [379] "Cell_379" "Cell_380" "Cell_381" "Cell_382" "Cell_383" "Cell_384"
#> [385] "Cell_385" "Cell_386" "Cell_387" "Cell_388" "Cell_389" "Cell_390"
#> [391] "Cell_391" "Cell_392" "Cell_393" "Cell_394" "Cell_395" "Cell_396"
#> [397] "Cell_397" "Cell_398" "Cell_399" "Cell_400" "Cell_401" "Cell_402"
#> [403] "Cell_403" "Cell_404" "Cell_405" "Cell_406" "Cell_407" "Cell_408"
#> [409] "Cell_409" "Cell_410" "Cell_411" "Cell_412" "Cell_413" "Cell_414"
#> [415] "Cell_415" "Cell_416" "Cell_417" "Cell_418" "Cell_419" "Cell_420"
#> [421] "Cell_421" "Cell_422" "Cell_423" "Cell_424" "Cell_425"
#>
#> $sample
#> [1] "Sample_1" "Sample_2" "Sample_3" "Sample_4" "Sample_5"
#>
Renders a heatmap for selected featureModule. Cells are
ordered from those with the lowest probability of the module on the left to
the highest probability on the right. Features are ordered from those
with the highest probability in the module
on the top to the lowest probability on the bottom.
moduleHeatmap(
x,
useAssay = "counts",
altExpName = "featureSubset",
modules = NULL,
featureModule = NULL,
col = circlize::colorRamp2(c(-2, 0, 2), c("#1E90FF", "#FFFFFF", "#CD2626")),
topCells = 100,
topFeatures = NULL,
normalizedCounts = NA,
normalize = "proportion",
transformationFun = sqrt,
scaleRow = scale,
showFeatureNames = TRUE,
displayName = NULL,
trim = c(-2, 2),
rowFontSize = NULL,
showHeatmapLegend = FALSE,
showTopAnnotationLegend = FALSE,
showTopAnnotationName = FALSE,
topAnnotationHeight = 5,
showModuleLabel = TRUE,
moduleLabel = "auto",
moduleLabelSize = NULL,
byrow = TRUE,
top = NA,
unit = "mm",
ncol = NULL,
useRaster = TRUE,
returnAsList = FALSE,
...
)
# S4 method for SingleCellExperiment
moduleHeatmap(
x,
useAssay = "counts",
altExpName = "featureSubset",
modules = NULL,
featureModule = NULL,
col = circlize::colorRamp2(c(-2, 0, 2), c("#1E90FF", "#FFFFFF", "#CD2626")),
topCells = 100,
topFeatures = NULL,
normalizedCounts = NA,
normalize = "proportion",
transformationFun = sqrt,
scaleRow = scale,
showFeatureNames = TRUE,
displayName = NULL,
trim = c(-2, 2),
rowFontSize = NULL,
showHeatmapLegend = FALSE,
showTopAnnotationLegend = FALSE,
showTopAnnotationName = FALSE,
topAnnotationHeight = 5,
showModuleLabel = TRUE,
moduleLabel = "auto",
moduleLabelSize = NULL,
byrow = TRUE,
top = NA,
unit = "mm",
ncol = NULL,
useRaster = TRUE,
returnAsList = FALSE,
...
)A numeric matrix of counts or a
SingleCellExperiment
with the matrix located in the assay slot under useAssay.
Rows represent features and columns represent cells. Celda
results must be present under metadata(altExp(x, altExpName)).
A string specifying which assay
slot to use if x is a
SingleCellExperiment object. Default "counts".
The name for the altExp slot to use. Default "featureSubset".
Integer Vector. The featureModule(s) to display.
Multiple modules can be included in a vector. Default NULL which
plots all module heatmaps.
Same as modules. Either can be used to specify
the modules to display.
Passed to Heatmap. Set color boundaries and colors.
Integer. Number of cells with the highest and lowest
probabilities for each module to include in the heatmap. For example, if
topCells = 50, the 50 cells with the lowest probabilities and
the 50 cells
with the highest probabilities for each featureModule will be included. If
NULL, all cells will be plotted. Default 100.
Integer. Plot `topFeatures` features with the highest
probabilities in the module heatmap for each featureModule. If NULL,
plot all features in the module. Default NULL.
Integer matrix. Rows represent features and columns
represent cells. If you have a normalized matrix result from
normalizeCounts, you can pass through the result here to
skip the normalization step in this function. Make sure the colnames and
rownames match the object in x. This matrix should
correspond to one generated from this count matrix
assay(altExp(x, altExpName), i = useAssay). If NA,
normalization will be carried out in the following form
normalizeCounts(assay(altExp(x, altExpName), i = useAssay),
normalize = "proportion", transformationFun = sqrt).
Use of this parameter is particularly useful for plotting many
module heatmaps, where normalizing the counts matrix repeatedly would
be too time consuming. Default NA.
Character. Passed to normalizeCounts if
normalizedCounts is NA.
Divides counts by the library sizes for each cell. One of 'proportion',
'cpm', 'median', or 'mean'. 'proportion' uses the total counts for each
cell as the library size. 'cpm' divides the library size of each cell by
one million to produce counts per million. 'median' divides the library
size of each cell by the median library size across all cells. 'mean'
divides the library size of each cell by the mean library size across all
cells. Default "proportion".
Function. Passed to normalizeCounts if
normalizedCounts is NA. Applies a transformation such as
sqrt, log, log2, log10, or log1p.
If NULL, no transformation will be applied. Occurs after
normalization. Default sqrt.
Function. Which function to use to scale each individual row. Set to NULL to disable. Occurs after normalization and log transformation. For example, scale will Z-score transform each row. Default scale.
Logical. Whether feature names should be displayed. Default TRUE.
Character. The column name of
rowData(altExp(x, altExpName)) that specifies the display names for
the features. Default NULL, which displays the row names. Only works
if showFeaturenames is TRUE and x is a
SingleCellExperiment object.
Numeric vector. Vector of length two that specifies the lower
and upper bounds for plotting the data. This threshold is applied
after row scaling. Set to NULL to disable. Default c(-2,2).
Numeric. Font size for feature names. If NULL,
then the size will automatically be determined. Default NULL.
Passed to Heatmap. Show legend for expression levels.
Passed to HeatmapAnnotation. Show legend for cell annotation.
Passed to HeatmapAnnotation. Show heatmap top annotation name.
Passed to HeatmapAnnotation. Column annotation height. rowAnnotation. Show legend for module annotation.
Show left side module labels.
The left side row titles for module heatmap. Must be
vector of the same length as featureModule. Default "auto", which
automatically pulls module labels from x.
Passed to gpar. The size of text (in points).
Passed to matrix. logical. If FALSE (the default)
the figure panel is filled by columns, otherwise the figure panel is filled
by rows.
Passed to marrangeGrob. The title for each page.
Passed to unit. Single character object defining the unit of all dimensions defined.
Integer. Number of columns of module heatmaps. If NULL,
then this will be automatically calculated so that the number of columns
and rows will be approximately the same. Default NULL.
Boolean. Rasterizing will make the heatmap a single object
and reduced the memory of the plot and the size of a file. If NULL,
then rasterization will be automatically determined by the underlying
Heatmap function. Default TRUE.
Boolean. If TRUE, then a list of plots will be
returned instead of a single multi-panel figure. These plots can be
displayed using the grid.draw function. Default FALSE.
Additional parameters passed to Heatmap.
A list object if plotting more than one module heatmaps. Otherwise a
HeatmapList object is returned.
data(sceCeldaCG)
moduleHeatmap(sceCeldaCG, displayName = "rownames")
get row and column indices of none zero elements in the matrix
nonzero(R_counts)A matrix
An integer matrix where each row is a row, column indices pair
Performs normalization, transformation, and/or scaling of a counts matrix
normalizeCounts(
counts,
normalize = c("proportion", "cpm", "median", "mean"),
scaleFactor = NULL,
transformationFun = NULL,
scaleFun = NULL,
pseudocountNormalize = 0,
pseudocountTransform = 0
)Integer, Numeric or Sparse matrix. Rows represent features and columns represent cells.
Character. Divides counts by the library sizes for each cell. One of 'proportion', 'cpm', 'median', or 'mean'. 'proportion' uses the total counts for each cell as the library size. 'cpm' divides the library size of each cell by one million to produce counts per million. 'median' divides the library size of each cell by the median library size across all cells. 'mean' divides the library size of each cell by the mean library size across all cells.
Numeric. Sets the scale factor for cell-level
normalization. This scale factor is multiplied to each cell after the
library size of each cell had been adjusted in normalize. Default
NULL which means no scale factor is applied.
Function. Applys a transformation such as sqrt, log, log2, log10, or log1p. If NULL, no transformation will be applied. Occurs after normalization. Default NULL.
Function. Scales the rows of the normalized and transformed count matrix. For example, 'scale' can be used to z-score normalize the rows. Default NULL.
Numeric. Add a pseudocount to counts before normalization. Default 0.
Numeric. Add a pseudocount to normalized counts before applying the transformation function. Adding a pseudocount can be useful before applying a log transformation. Default 0.
Numeric Matrix. A normalized matrix.
data(celdaCGSim)
normalizedCounts <- normalizeCounts(celdaCGSim$counts, "proportion",
pseudocountNormalize = 1)
Retrieves the K/L, model priors (e.g. alpha, beta), and count matrix checksum parameters provided during the creation of the provided celdaModel.
params(celdaMod)
# S4 method for celdaModel
params(celdaMod)celdaModel. Options available in
celda::availableModels.
List. Contains the model-specific parameters for the provided celda model object depending on its class.
data(celdaCGMod)
params(celdaCGMod)
#> $K
#> [1] 5
#>
#> $L
#> [1] 10
#>
#> $alpha
#> [1] 1
#>
#> $beta
#> [1] 1
#>
#> $delta
#> [1] 1
#>
#> $gamma
#> [1] 1
#>
#> $seed
#> [1] 12345
#>
#> $countChecksum
#> [1] "b47286aed8081daa674f796655314d67"
#>
Perplexity is a statistical measure of how well a probability model can predict new data. Lower perplexity indicates a better model.
perplexity(
x,
celdaMod,
useAssay = "counts",
altExpName = "featureSubset",
newCounts = NULL
)
# S4 method for SingleCellExperiment,ANY
perplexity(
x,
useAssay = "counts",
altExpName = "featureSubset",
newCounts = NULL
)
# S4 method for ANY,celda_CG
perplexity(x, celdaMod, newCounts = NULL)
# S4 method for ANY,celda_C
perplexity(x, celdaMod, newCounts = NULL)
# S4 method for ANY,celda_G
perplexity(x, celdaMod, newCounts = NULL)Can be one of
A SingleCellExperiment object returned by
celda_C, celda_G or celda_CG, with the matrix
located in the useAssay assay slot.
Rows represent features and columns represent cells.
Integer counts matrix. Rows represent features and columns represent
cells. This matrix should be the same as the one used to generate
celdaMod.
Celda model object. Only works if x is an integer
counts matrix.
A string specifying which assay
slot to use if x is a SingleCellExperiment object.
Default "counts".
The name for the altExp slot to use. Default "featureSubset".
A new counts matrix used to calculate perplexity. If NULL,
perplexity will be calculated for the matrix in useAssay slot in
x. Default NULL.
Numeric. The perplexity for the provided x (and
celdaModel).
data(sceCeldaCG)
perplexity <- perplexity(sceCeldaCG)
data(celdaCGSim, celdaCGMod)
perplexity <- perplexity(celdaCGSim$counts, celdaCGMod)
data(celdaCSim, celdaCMod)
perplexity <- perplexity(celdaCSim$counts, celdaCMod)
data(celdaGSim, celdaGMod)
perplexity <- perplexity(celdaGSim$counts, celdaGMod)
Outputs a violin plot for feature expression data.
plotCeldaViolin(
x,
celdaMod,
features,
displayName = NULL,
useAssay = "counts",
altExpName = "featureSubset",
exactMatch = TRUE,
plotDots = TRUE,
dotSize = 0.1
)
# S4 method for SingleCellExperiment
plotCeldaViolin(
x,
features,
displayName = NULL,
useAssay = "counts",
altExpName = "featureSubset",
exactMatch = TRUE,
plotDots = TRUE,
dotSize = 0.1
)
# S4 method for ANY
plotCeldaViolin(
x,
celdaMod,
features,
exactMatch = TRUE,
plotDots = TRUE,
dotSize = 0.1
)Numeric matrix or a SingleCellExperiment object
with the matrix located in the assay slot under useAssay. Rows
represent features and columns represent cells.
Celda object of class "celda_G" or "celda_CG". Used only if
x is a matrix object.
Character vector. Uses these genes for plotting.
Character. The column name of
rowData(x) that specifies the display names for
the features. Default NULL, which displays the row names. Only works
if x is a SingleCellExperiment object.
A string specifying which assay
slot to use if x is a
SingleCellExperiment object. Default "counts".
The name for the altExp slot to use. Default "featureSubset".
Logical. Whether an exact match or a partial match using
grep() is used to look up the feature in the rownames of the counts
matrix. Default TRUE.
Boolean. If TRUE, the
expression of features will be plotted as points in addition to the violin
curve. Default TRUE.
Numeric. Size of points if plotDots = TRUE.
Default 0.1.
Violin plot for each feature, grouped by celda cluster
A scatter plot of the UMAP dimensions generated by DecontX with cells colored by the estimated percentation of contamation.
plotDecontXContamination(
x,
batch = NULL,
colorScale = c("blue", "green", "yellow", "orange", "red"),
size = 1
)Either a SingleCellExperiment with decontX
results stored in metadata(x)$decontX or the result from running
decontX on a count matrix.
Character. Batch of cells to plot. If NULL, then
the first batch in the list will be selected. Default NULL.
Character vector. Contains the color spectrum to be passed
to scale_colour_gradientn from package 'ggplot2'. Default
c("blue","green","yellow","orange","red").
Numeric. Size of points in the scatterplot. Default 1.
Returns a ggplot object.
See decontX for a full example of how to estimate
and plot contamination.
R/plot_decontx.R
plotDecontXMarkerExpression.RdGenerates a violin plot that shows the counts of marker
genes in cells across specific clusters or cell types. Can be used to view
the expression of marker genes in different cell types before and after
decontamination with decontX.
plotDecontXMarkerExpression(
x,
markers,
groupClusters = NULL,
assayName = c("counts", "decontXcounts"),
z = NULL,
exactMatch = TRUE,
by = "rownames",
log1p = FALSE,
ncol = NULL,
plotDots = FALSE,
dotSize = 0.1
)Either a SingleCellExperiment or a matrix-like object of counts.
Character Vector or List. A character vector or list of character vectors with the names of the marker genes of interest.
List. A named list that allows
cell clusters labels coded in
z to be regrouped and renamed on the fly. For example,
list(Tcells=c(1, 2), Bcells=7) would recode clusters
1 and 2 to "Tcells"
and cluster 7 to "Bcells". Note that if this is used, clusters
in z not found
in groupClusters will be excluded. Default NULL.
Character vector. Name(s) of the assay(s) to
plot if x is a
SingleCellExperiment. If more than one assay is listed, then
side-by-side violin plots will be generated.
Default c("counts", "decontXcounts").
Character, Integer, or Vector.
Indicates the cluster labels for each cell.
If x is a SingleCellExperiment and z = NULL,
then the cluster labels from decontX will be retreived from the
colData of x (i.e. colData(x)$decontX_clusters).
If z is a single character or integer, then that column will be
retrived from colData of x. (i.e. colData(x)[,z]).
If x is a counts matrix, then z will need to be a vector
the same length as the number of columns in x that indicate
the cluster to which each cell belongs. Default NULL.
Boolean. Whether to only identify exact matches
for the markers or to identify partial matches using grep.
See retrieveFeatureIndex for more details.
Default TRUE.
Character. Where to search for the markers if x is a
SingleCellExperiment. See retrieveFeatureIndex
for more details. If x is a matrix, then this must be set to
"rownames". Default "rownames".
Boolean. Whether to apply the function log1p to the data
before plotting. This function will add a pseudocount of 1 and then log
transform the expression values. Default FALSE.
Integer. Number of columns to make in the plot.
Default NULL.
Boolean. If TRUE, the
expression of features will be plotted as points in addition to the violin
curve. Default FALSE.
Numeric. Size of points if plotDots = TRUE.
Default 0.1.
Returns a ggplot object.
See decontX for a full example of how to estimate
and plot contamination.
R/plot_decontx.R
plotDecontXMarkerPercentage.RdGenerates a barplot that shows the percentage of
cells within clusters or cell types that have detectable levels
of given marker genes. Can be used to view the expression of
marker genes in different cell types before and after
decontamination with decontX.
Either a SingleCellExperiment or a matrix-like object of counts.
List. A named list indicating the marker genes
for each cell type of
interest. Multiple markers can be supplied for each cell type. For example,
list(Tcell_Markers=c("CD3E", "CD3D"),
Bcell_Markers=c("CD79A", "CD79B", "MS4A1")
would specify markers for human T-cells and B-cells.
A cell will be considered
"positive" for a cell type if it has a count greater than threshold
for at least one of the marker genes in the list.
List. A named list that allows
cell clusters labels coded in
z to be regrouped and renamed on the fly. For example,
list(Tcells=c(1, 2), Bcells=7) would recode
clusters 1 and 2 to "Tcells"
and cluster 7 to "Bcells". Note that if this is
used, clusters in z not found
in groupClusters will be excluded from the barplot.
Default NULL.
Character vector. Name(s) of the assay(s) to
plot if x is a
SingleCellExperiment. If more than one assay
is listed, then side-by-side barplots will be generated.
Default c("counts", "decontXcounts").
Character, Integer, or Vector. Indicates the cluster labels
for each cell.
If x is a SingleCellExperiment and z = NULL,
then the cluster labels from decontX will be retived from the
colData of x (i.e. colData(x)$decontX_clusters).
If z is a single character or integer,
then that column will be retrived
from colData of x. (i.e. colData(x)[,z]). If x
is a counts matrix, then z will need
to be a vector the same length as
the number of columns in x that indicate
the cluster to which each cell
belongs. Default NULL.
Numeric. Markers greater than or equal to this value will be considered detected in a cell. Default 1.
Boolean. Whether to only identify exact matches
for the markers or to identify partial matches using grep. See
retrieveFeatureIndex for more details. Default TRUE.
Character. Where to search for the markers if x is a
SingleCellExperiment. See retrieveFeatureIndex
for more details. If x is a matrix,
then this must be set to "rownames".Default "rownames".
Integer. Number of columns to make in the plot.
Default round(sqrt(length(markers)).
Boolean. Whether to display percentages above each bar
Default TRUE.
Numeric. Size of the percentage labels in the barplot. Default 3.
Returns a ggplot object.
See decontX for a full example of how to estimate
and plot contamination.
Create a scatterplot for each row of a normalized
gene expression matrix where x and y axis are from a
data dimension reduction tool.
The cells are colored by "celda_cell_cluster" column in
colData(altExp(x, altExpName)) if x is a
SingleCellExperiment object, or x if x is
a integer vector of cell cluster labels.
plotDimReduceCluster(
x,
reducedDimName,
altExpName = "featureSubset",
dim1 = NULL,
dim2 = NULL,
size = 0.5,
xlab = NULL,
ylab = NULL,
specificClusters = NULL,
labelClusters = FALSE,
groupBy = NULL,
labelSize = 3.5
)
# S4 method for SingleCellExperiment
plotDimReduceCluster(
x,
reducedDimName,
altExpName = "featureSubset",
dim1 = 1,
dim2 = 2,
size = 0.5,
xlab = NULL,
ylab = NULL,
specificClusters = NULL,
labelClusters = FALSE,
groupBy = NULL,
labelSize = 3.5
)
# S4 method for vector
plotDimReduceCluster(
x,
dim1,
dim2,
size = 0.5,
xlab = "Dimension_1",
ylab = "Dimension_2",
specificClusters = NULL,
labelClusters = FALSE,
groupBy = NULL,
labelSize = 3.5
)Integer vector of cell cluster labels or a
SingleCellExperiment object
containing cluster labels for each cell in "celda_cell_cluster"
column in colData(x).
The name of the dimension reduction slot in
reducedDimNames(x) if x is a
SingleCellExperiment object. Ignored if both dim1 and
dim2 are set.
The name for the altExp slot to use. Default "featureSubset".
Integer or numeric vector. If reducedDimName is supplied,
then, this will be used as an index to determine which dimension will be
plotted on the x-axis. If reducedDimName is not supplied, then this
should be a vector which will be plotted on the x-axis. Default 1.
Integer or numeric vector. If reducedDimName is supplied,
then, this will be used as an index to determine which dimension will be
plotted on the y-axis. If reducedDimName is not supplied, then this
should be a vector which will be plotted on the y-axis. Default 2.
Numeric. Sets size of point on plot. Default 0.5.
Character vector. Label for the x-axis. Default NULL.
Character vector. Label for the y-axis. Default NULL.
Numeric vector.
Only color cells in the specified clusters.
All other cells will be grey.
If NULL, all clusters will be colored. Default NULL.
Logical. Whether the cluster labels are plotted. Default FALSE.
Character vector. Contains sample labels for each cell. If NULL, all samples will be plotted together. Default NULL.
Numeric. Sets size of label if labelClusters is TRUE. Default 3.5.
The plot as a ggplot object
data(sceCeldaCG)
sce <- celdaTsne(sceCeldaCG)
plotDimReduceCluster(x = sce,
reducedDimName = "celda_tSNE",
specificClusters = c(1, 2, 3))
library(SingleCellExperiment)
data(sceCeldaCG, celdaCGMod)
sce <- celdaTsne(sceCeldaCG)
plotDimReduceCluster(x = celdaClusters(celdaCGMod)$z,
dim1 = reducedDim(altExp(sce), "celda_tSNE")[, 1],
dim2 = reducedDim(altExp(sce), "celda_tSNE")[, 2],
specificClusters = c(1, 2, 3))
R/plot_dr.R
plotDimReduceFeature.RdCreate a scatterplot for each row of a normalized gene expression matrix where x and y axis are from a data dimension reduction tool. The cells are colored by expression of the specified feature.
plotDimReduceFeature(
x,
features,
reducedDimName = NULL,
displayName = NULL,
dim1 = NULL,
dim2 = NULL,
headers = NULL,
useAssay = "counts",
altExpName = "featureSubset",
normalize = FALSE,
zscore = TRUE,
exactMatch = TRUE,
trim = c(-2, 2),
limits = c(-2, 2),
size = 0.5,
xlab = NULL,
ylab = NULL,
colorLow = "blue4",
colorMid = "grey90",
colorHigh = "firebrick1",
midpoint = 0,
ncol = NULL,
decreasing = FALSE
)
# S4 method for SingleCellExperiment
plotDimReduceFeature(
x,
features,
reducedDimName = NULL,
displayName = NULL,
dim1 = 1,
dim2 = 2,
headers = NULL,
useAssay = "counts",
altExpName = "featureSubset",
normalize = FALSE,
zscore = TRUE,
exactMatch = TRUE,
trim = c(-2, 2),
limits = c(-2, 2),
size = 0.5,
xlab = NULL,
ylab = NULL,
colorLow = "blue4",
colorMid = "grey90",
colorHigh = "firebrick1",
midpoint = 0,
ncol = NULL,
decreasing = FALSE
)
# S4 method for ANY
plotDimReduceFeature(
x,
features,
dim1,
dim2,
headers = NULL,
normalize = FALSE,
zscore = TRUE,
exactMatch = TRUE,
trim = c(-2, 2),
limits = c(-2, 2),
size = 0.5,
xlab = "Dimension_1",
ylab = "Dimension_2",
colorLow = "blue4",
colorMid = "grey90",
colorHigh = "firebrick1",
midpoint = 0,
ncol = NULL,
decreasing = FALSE
)Numeric matrix or a SingleCellExperiment object
with the matrix located in the assay slot under useAssay. Rows
represent features and columns represent cells.
Character vector. Features in the rownames of counts to plot.
The name of the dimension reduction slot in
reducedDimNames(x) if x is a
SingleCellExperiment object. If NULL, then both
dim1 and dim2 need to be set. Default NULL.
Character. The column name of
rowData(x) that specifies the display names for
the features. Default NULL, which displays the row names. Only works
if x is a SingleCellExperiment object. Overwrites
headers.
Integer or numeric vector. If reducedDimName is supplied,
then, this will be used as an index to determine which dimension will be
plotted on the x-axis. If reducedDimName is not supplied, then this
should be a vector which will be plotted on the x-axis. Default 1.
Integer or numeric vector. If reducedDimName is supplied,
then, this will be used as an index to determine which dimension will be
plotted on the y-axis. If reducedDimName is not supplied, then this
should be a vector which will be plotted on the y-axis. Default 2.
Character vector. If NULL, the corresponding
rownames are used as labels. Otherwise, these headers are used to label
the features. Only works if displayName is NULL and
exactMatch is FALSE.
A string specifying which assay
slot to use if x is a
SingleCellExperiment object. Default "counts".
The name for the altExp slot to use. Default "featureSubset".
Logical. Whether to normalize the columns of `counts`.
Default FALSE.
Logical. Whether to scale each feature to have a mean 0
and standard deviation of 1. Default TRUE.
Logical. Whether an exact match or a partial match using
grep() is used to look up the feature in the rownames of the counts
matrix. Default TRUE.
Numeric vector. Vector of length two that specifies the lower
and upper bounds for the data. This threshold is applied after row scaling.
Set to NULL to disable. Default c(-1,1).
Passed to scale_colour_gradient2. The range of color scale.
Numeric. Sets size of point on plot. Default 1.
Character vector. Label for the x-axis. If reducedDimName
is used, then this will be set to the column name of the first dimension of
that object. Default "Dimension_1".
Character vector. Label for the y-axis. If reducedDimName
is used, then this will be set to the column name of the second dimension of
that object. Default "Dimension_2".
Character. A color available from `colors()`. The color will be used to signify the lowest values on the scale.
Character. A color available from `colors()`. The color will be used to signify the midpoint on the scale.
Character. A color available from `colors()`. The color will be used to signify the highest values on the scale.
Numeric. The value indicating the midpoint of the
diverging color scheme. If NULL, defaults to the mean
with 10 percent of values trimmed. Default 0.
Integer. Passed to facet_wrap. Specify the number of columns for facet wrap.
logical. Specifies the order of plotting the points.
If FALSE, the points will be plotted in increasing order where
the points with largest values will be on top. TRUE otherwise.
If NULL, no sorting is performed. Points will be plotted in their
current order in x. Default FALSE.
The plot as a ggplot object
data(sceCeldaCG)
sce <- celdaTsne(sceCeldaCG)
plotDimReduceFeature(x = sce,
reducedDimName = "celda_tSNE",
normalize = TRUE,
features = c("Gene_98", "Gene_99"),
exactMatch = TRUE)
library(SingleCellExperiment)
data(sceCeldaCG)
sce <- celdaTsne(sceCeldaCG)
plotDimReduceFeature(x = counts(sce),
dim1 = reducedDim(altExp(sce), "celda_tSNE")[, 1],
dim2 = reducedDim(altExp(sce), "celda_tSNE")[, 2],
normalize = TRUE,
features = c("Gene_98", "Gene_99"),
exactMatch = TRUE)
Creates a scatterplot given two dimensions from a data dimension reduction tool (e.g tSNE) output.
plotDimReduceGrid(
x,
reducedDimName,
dim1 = NULL,
dim2 = NULL,
useAssay = "counts",
altExpName = "featureSubset",
size = 1,
xlab = "Dimension_1",
ylab = "Dimension_2",
limits = c(-2, 2),
colorLow = "blue4",
colorMid = "grey90",
colorHigh = "firebrick1",
midpoint = 0,
varLabel = NULL,
ncol = NULL,
headers = NULL,
decreasing = FALSE
)
# S4 method for SingleCellExperiment
plotDimReduceGrid(
x,
reducedDimName,
dim1 = NULL,
dim2 = NULL,
useAssay = "counts",
altExpName = "featureSubset",
size = 1,
xlab = "Dimension_1",
ylab = "Dimension_2",
limits = c(-2, 2),
colorLow = "blue4",
colorMid = "grey90",
colorHigh = "firebrick1",
midpoint = 0,
varLabel = NULL,
ncol = NULL,
headers = NULL,
decreasing = FALSE
)
# S4 method for ANY
plotDimReduceGrid(
x,
dim1,
dim2,
size = 1,
xlab = "Dimension_1",
ylab = "Dimension_2",
limits = c(-2, 2),
colorLow = "blue4",
colorMid = "grey90",
colorHigh = "firebrick1",
midpoint = 0,
varLabel = NULL,
ncol = NULL,
headers = NULL,
decreasing = FALSE
)Numeric matrix or a SingleCellExperiment object
with the matrix located in the assay slot under useAssay. Each
row of the matrix will be plotted as a separate facet.
The name of the dimension reduction slot in
reducedDimNames(x) if x is a
SingleCellExperiment object. Ignored if both dim1 and
dim2 are set.
Numeric vector. Second dimension from data dimension reduction output.
Numeric vector. Second dimension from data dimension reduction output.
A string specifying which assay
slot to use if x is a
SingleCellExperiment object. Default "counts".
The name for the altExp slot to use. Default "featureSubset".
Numeric. Sets size of point on plot. Default 1.
Character vector. Label for the x-axis. Default 'Dimension_1'.
Character vector. Label for the y-axis. Default 'Dimension_2'.
Passed to scale_colour_gradient2. The range of color scale.
Character. A color available from `colors()`. The color will be used to signify the lowest values on the scale. Default "blue4".
Character. A color available from `colors()`. The color will be used to signify the midpoint on the scale. Default "grey90".
Character. A color available from `colors()`. The color will be used to signify the highest values on the scale. Default "firebrick1".
Numeric. The value indicating the midpoint of the
diverging color scheme. If NULL, defaults to the mean
with 10 percent of values trimmed. Default 0.
Character vector. Title for the color legend.
Integer. Passed to facet_wrap. Specify the number of columns for facet wrap.
Character vector. If `NULL`, the corresponding rownames are used as labels. Otherwise, these headers are used to label the genes.
logical. Specifies the order of plotting the points.
If FALSE, the points will be plotted in increasing order where
the points with largest values will be on top. TRUE otherwise.
If NULL, no sorting is performed. Points will be plotted in their
current order in x. Default FALSE.
The plot as a ggplot object
data(sceCeldaCG)
sce <- celdaTsne(sceCeldaCG)
plotDimReduceGrid(x = sce,
reducedDimName = "celda_tSNE",
xlab = "Dimension1",
ylab = "Dimension2",
varLabel = "tSNE")
library(SingleCellExperiment)
data(sceCeldaCG)
sce <- celdaTsne(sceCeldaCG)
plotDimReduceGrid(x = counts(sce),
dim1 = reducedDim(altExp(sce), "celda_tSNE")[, 1],
dim2 = reducedDim(altExp(sce), "celda_tSNE")[, 2],
xlab = "Dimension1",
ylab = "Dimension2",
varLabel = "tSNE")
R/plot_dr.R
plotDimReduceModule.RdCreate a scatterplot for each row of a normalized gene expression matrix where x and y axis are from a data dimension reduction tool. The cells are colored by the module probability.
plotDimReduceModule(
x,
reducedDimName,
useAssay = "counts",
altExpName = "featureSubset",
celdaMod,
modules = NULL,
dim1 = NULL,
dim2 = NULL,
size = 0.5,
xlab = NULL,
ylab = NULL,
rescale = TRUE,
limits = c(0, 1),
colorLow = "grey90",
colorHigh = "firebrick1",
ncol = NULL,
decreasing = FALSE
)
# S4 method for SingleCellExperiment
plotDimReduceModule(
x,
reducedDimName,
useAssay = "counts",
altExpName = "featureSubset",
modules = NULL,
dim1 = 1,
dim2 = 2,
size = 0.5,
xlab = NULL,
ylab = NULL,
rescale = TRUE,
limits = c(0, 1),
colorLow = "grey90",
colorHigh = "firebrick1",
ncol = NULL,
decreasing = FALSE
)
# S4 method for ANY
plotDimReduceModule(
x,
celdaMod,
modules = NULL,
dim1,
dim2,
size = 0.5,
xlab = "Dimension_1",
ylab = "Dimension_2",
rescale = TRUE,
limits = c(0, 1),
colorLow = "grey90",
colorHigh = "firebrick1",
ncol = NULL,
decreasing = FALSE
)Numeric matrix or a SingleCellExperiment object
with the matrix located in the assay slot under useAssay. Rows
represent features and columns represent cells.
The name of the dimension reduction slot in
reducedDimNames(x) if x is a
SingleCellExperiment object. Ignored if both dim1 and
dim2 are set.
A string specifying which assay
slot to use if x is a
SingleCellExperiment object. Default "counts".
The name for the altExp slot to use. Default "featureSubset".
Celda object of class "celda_G" or "celda_CG". Used only if
x is a matrix object.
Character vector. Module(s) from celda model to be plotted. e.g. c("1", "2").
Integer or numeric vector. If reducedDimName is supplied,
then, this will be used as an index to determine which dimension will be
plotted on the x-axis. If reducedDimName is not supplied, then this
should be a vector which will be plotted on the x-axis. Default 1.
Integer or numeric vector. If reducedDimName is supplied,
then, this will be used as an index to determine which dimension will be
plotted on the y-axis. If reducedDimName is not supplied, then this
should be a vector which will be plotted on the y-axis. Default 2.
Numeric. Sets size of point on plot. Default 0.5.
Character vector. Label for the x-axis. Default "Dimension_1".
Character vector. Label for the y-axis. Default "Dimension_2".
Logical. Whether rows of the matrix should be rescaled to [0, 1]. Default TRUE.
Passed to scale_colour_gradient. The range of color scale.
Character. A color available from `colors()`. The color will be used to signify the lowest values on the scale.
Character. A color available from `colors()`. The color will be used to signify the highest values on the scale.
Integer. Passed to facet_wrap. Specify the number of columns for facet wrap.
logical. Specifies the order of plotting the points.
If FALSE, the points will be plotted in increasing order where
the points with largest values will be on top. TRUE otherwise.
If NULL, no sorting is performed. Points will be plotted in their
current order in x. Default FALSE.
The plot as a ggplot object
data(sceCeldaCG)
sce <- celdaTsne(sceCeldaCG)
plotDimReduceModule(x = sce,
reducedDimName = "celda_tSNE",
modules = c("1", "2"))
library(SingleCellExperiment)
data(sceCeldaCG, celdaCGMod)
sce <- celdaTsne(sceCeldaCG)
plotDimReduceModule(x = counts(sce),
dim1 = reducedDim(altExp(sce), "celda_tSNE")[, 1],
dim2 = reducedDim(altExp(sce), "celda_tSNE")[, 2],
celdaMod = celdaCGMod,
modules = c("1", "2"))
Visualize perplexity of every model in a celdaList, by unique K/L combinations
plotGridSearchPerplexity(x, altExpName = "featureSubset", sep = 5, alpha = 0.5)
# S4 method for SingleCellExperiment
plotGridSearchPerplexity(x, altExpName = "featureSubset", sep = 5, alpha = 0.5)
# S4 method for celdaList
plotGridSearchPerplexity(x, sep = 5, alpha = 0.5)Can be one of
A SingleCellExperiment object returned from
celdaGridSearch, recursiveSplitModule,
or recursiveSplitCell. Must contain a list named
"celda_grid_search" in metadata(x).
celdaList object.
The name for the altExp slot
to use. Default "featureSubset". Only works if x is a
SingleCellExperiment object.
Numeric. Breaks in the x axis of the resulting plot.
Numeric. Passed to geom_jitter. Opacity of the points. Values of alpha range from 0 to 1, with lower values corresponding to more transparent colors.
A ggplot plot object showing perplexity as a function of clustering parameters.
data(sceCeldaCGGridSearch)
sce <- resamplePerplexity(sceCeldaCGGridSearch)
plotGridSearchPerplexity(sce)
data(celdaCGSim, celdaCGGridSearchRes)
## Run various combinations of parameters with 'celdaGridSearch'
celdaCGGridSearchRes <- resamplePerplexity(
celdaCGSim$counts,
celdaCGGridSearchRes)
plotGridSearchPerplexity(celdaCGGridSearchRes)
Renders a heatmap based on a matrix of counts where rows are features and columns are cells.
plotHeatmap(
counts,
z = NULL,
y = NULL,
scaleRow = scale,
trim = c(-2, 2),
featureIx = NULL,
cellIx = NULL,
clusterFeature = TRUE,
clusterCell = TRUE,
colorScheme = c("divergent", "sequential"),
colorSchemeSymmetric = TRUE,
colorSchemeCenter = 0,
col = NULL,
annotationCell = NULL,
annotationFeature = NULL,
annotationColor = NULL,
breaks = NULL,
legend = TRUE,
annotationLegend = TRUE,
annotationNamesFeature = TRUE,
annotationNamesCell = TRUE,
showNamesFeature = FALSE,
showNamesCell = FALSE,
rowGroupOrder = NULL,
colGroupOrder = NULL,
hclustMethod = "ward.D2",
treeheightFeature = ifelse(clusterFeature, 50, 0),
treeheightCell = ifelse(clusterCell, 50, 0),
silent = FALSE,
...
)Numeric or sparse matrix. Normalized counts matrix where rows represent features and columns represent cells. .
Numeric vector. Denotes cell population labels.
Numeric vector. Denotes feature module labels.
Function. A function to scale each individual row. Set to NULL to disable. Occurs after normalization and log transformation. Defualt is 'scale' and thus will Z-score transform each row.
Numeric vector. Vector of length two that specifies the lower and upper bounds for the data. This threshold is applied after row scaling. Set to NULL to disable. Default c(-2,2).
Integer vector. Select features for display in heatmap. If NULL, no subsetting will be performed. Default NULL.
Integer vector. Select cells for display in heatmap. If NULL, no subsetting will be performed. Default NULL.
Logical. Determines whether rows should be clustered. Default TRUE.
Logical. Determines whether columns should be clustered. Default TRUE.
Character. One of "divergent" or "sequential". A "divergent" scheme is best for highlighting relative data (denoted by 'colorSchemeCenter') such as gene expression data that has been normalized and centered. A "sequential" scheme is best for highlighting data that are ordered low to high such as raw counts or probabilities. Default "divergent".
Logical. When the colorScheme is "divergent"
and the data contains both positive and negative numbers, TRUE indicates
that the color scheme should be symmetric from
[-max(abs(data)), max(abs(data))]. For example, if the data ranges
goes from -1.5 to 2, then setting this to TRUE will force the color scheme
to range from -2 to 2. Default TRUE.
Numeric. Indicates the center of a "divergent" colorScheme. Default 0.
Color for the heatmap.
Data frame. Additional annotations for each cell will be shown in the column color bars. The format of the data frame should be one row for each cell and one column for each annotation. Numeric variables will be displayed as continuous color bars and factors will be displayed as discrete color bars. Default NULL.
A data frame for the feature annotations (rows).
List. Contains color scheme for all annotations. See `?pheatmap` for more details.
Numeric vector. A sequence of numbers that covers the range of values in the normalized `counts`. Values in the normalized `matrix` are assigned to each bin in `breaks`. Each break is assigned to a unique color from `col`. If NULL, then breaks are calculated automatically. Default NULL.
Logical. Determines whether legend should be drawn. Default TRUE.
Logical. Whether legend for all annotations should be drawn. Default TRUE.
Logical. Whether the names for features should be shown. Default TRUE.
Logical. Whether the names for cells should be shown. Default TRUE.
Logical. Specifies if feature names should be shown. Default TRUE.
Logical. Specifies if cell names should be shown. Default FALSE.
Vector. Specifies the order of feature clusters when
semisupervised clustering is performed on the y labels.
Vector. Specifies the order of cell clusters when
semisupervised clustering is performed on the z labels.
Character. Specifies the method to use for the 'hclust' function. See `?hclust` for possible values. Default "ward.D2".
Numeric. Width of the feature dendrogram. Set to 0 to disable plotting of this dendrogram. Default: if clusterFeature == TRUE, then treeheightFeature = 50, else treeheightFeature = 0.
Numeric. Height of the cell dendrogram. Set to 0 to disable plotting of this dendrogram. Default: if clusterCell == TRUE, then treeheightCell = 50, else treeheightCell = 0.
Logical. Whether to plot the heatmap.
Other arguments to be passed to underlying pheatmap function.
list A list containing dendrogram information and the heatmap grob
data(celdaCGSim, celdaCGMod)
plotHeatmap(celdaCGSim$counts,
z = celdaClusters(celdaCGMod)$z, y = celdaClusters(celdaCGMod)$y
)
#> TableGrob (5 x 6) "layout": 9 grobs
#> z cells name grob
#> 1 1 (2-2,3-3) col_tree polyline[GRID.polyline.12713]
#> 2 2 (4-4,1-1) row_tree polyline[GRID.polyline.12714]
#> 3 3 (4-4,3-3) matrix gTree[GRID.gTree.12716]
#> 4 4 (3-3,3-3) col_annotation rect[GRID.rect.12717]
#> 5 5 (3-3,4-4) col_annotation_names text[GRID.text.12718]
#> 6 6 (4-4,2-2) row_annotation rect[GRID.rect.12719]
#> 7 7 (5-5,2-2) row_annotation_names text[GRID.text.12720]
#> 8 8 (4-5,6-6) annotationLegend gTree[GRID.gTree.12728]
#> 9 9 (4-5,5-5) legend gTree[GRID.gTree.12731]
Visualize perplexity differences of every model in a celdaList, by unique K/L combinations.
plotRPC(x, altExpName = "featureSubset", sep = 5, alpha = 0.5)
# S4 method for SingleCellExperiment
plotRPC(x, altExpName = "featureSubset", sep = 5, alpha = 0.5)
# S4 method for celdaList
plotRPC(x, sep = 5, alpha = 0.5)Can be one of
A SingleCellExperiment object returned from
celdaGridSearch, recursiveSplitModule,
or recursiveSplitCell. Must contain a list named
"celda_grid_search" in metadata(x).
celdaList object.
The name for the altExp slot to use. Default "featureSubset".
Numeric. Breaks in the x axis of the resulting plot.
Numeric. Passed to geom_jitter. Opacity of the points. Values of alpha range from 0 to 1, with lower values corresponding to more transparent colors.
A ggplot plot object showing perplexity differences as a function of clustering parameters.
data(sceCeldaCGGridSearch)
sce <- resamplePerplexity(sceCeldaCGGridSearch)
plotRPC(sce)
data(celdaCGSim, celdaCGGridSearchRes)
## Run various combinations of parameters with 'celdaGridSearch'
celdaCGGridSearchRes <- resamplePerplexity(
celdaCGSim$counts,
celdaCGGridSearchRes)
plotRPC(celdaCGGridSearchRes)
Recode feature module clusters using a mapping in the
from and to arguments.
recodeClusterY(sce, from, to, altExpName = "featureSubset")SingleCellExperiment object returned from
celda_G or celda_CG. Must contain column
celda_feature_module in
rowData(altExp(sce, altExpName)).
Numeric vector. Unique values in the range of
seq(celdaModules(sce)) that correspond to the original module labels
in sce.
Numeric vector. Unique values in the range of
seq(celdaModules(sce)) that correspond to the new module labels.
The name for the altExp slot to use. Default "featureSubset".
@return SingleCellExperiment object with recoded feature module labels.
Recode cell subpopulaton clusters using a mapping in the
from and to arguments.
recodeClusterZ(sce, from, to, altExpName = "featureSubset")SingleCellExperiment object returned from
celda_C or celda_CG. Must contain column
celda_cell_cluster in
colData(altExp(sce, altExpName)).
Numeric vector. Unique values in the range of
seq(max(as.integer(celdaClusters(sce, altExpName = altExpName))))
that correspond to the original cluster
labels in sce.
Numeric vector. Unique values in the range of
seq(max(as.integer(celdaClusters(sce, altExpName = altExpName))))
that correspond to the new cluster labels.
The name for the altExp slot to use. Default "featureSubset".
SingleCellExperiment object with recoded cell cluster labels.
Uses the celda_C model to cluster cells into
population for range of possible K's. The cell population labels of the
previous "K-1" model are used as the initial values in the current model
with K cell populations. The best split of an existing cell population is
found to create the K-th cluster. This procedure is much faster than
randomly initializing each model with a different K. If module labels for
each feature are given in 'yInit', the celda_CG model will be used to
split cell populations based on those modules instead of individual
features. Module labels will also be updated during sampling and thus
may end up slightly different than yInit.
recursiveSplitCell(
x,
useAssay = "counts",
altExpName = "featureSubset",
sampleLabel = NULL,
initialK = 5,
maxK = 25,
tempL = NULL,
yInit = NULL,
alpha = 1,
beta = 1,
delta = 1,
gamma = 1,
minCell = 3,
reorder = TRUE,
seed = 12345,
perplexity = TRUE,
doResampling = FALSE,
numResample = 5,
logfile = NULL,
verbose = TRUE
)
# S4 method for SingleCellExperiment
recursiveSplitCell(
x,
useAssay = "counts",
altExpName = "featureSubset",
sampleLabel = NULL,
initialK = 5,
maxK = 25,
tempL = NULL,
yInit = NULL,
alpha = 1,
beta = 1,
delta = 1,
gamma = 1,
minCell = 3,
reorder = TRUE,
seed = 12345,
perplexity = TRUE,
doResampling = FALSE,
numResample = 5,
logfile = NULL,
verbose = TRUE
)
# S4 method for matrix
recursiveSplitCell(
x,
useAssay = "counts",
altExpName = "featureSubset",
sampleLabel = NULL,
initialK = 5,
maxK = 25,
tempL = NULL,
yInit = NULL,
alpha = 1,
beta = 1,
delta = 1,
gamma = 1,
minCell = 3,
reorder = TRUE,
seed = 12345,
perplexity = TRUE,
doResampling = FALSE,
numResample = 5,
logfile = NULL,
verbose = TRUE
)A numeric matrix of counts or a
SingleCellExperiment
with the matrix located in the assay slot under useAssay.
Rows represent features and columns represent cells.
A string specifying the name of the assay slot to use. Default "counts".
The name for the altExp slot to use. Default "featureSubset".
Vector or factor. Denotes the sample label for each cell (column) in the count matrix.
Integer. Initial number of cell populations to try.
Default 5.
Integer. Maximum number of cell populations to try.
Default 25.
Integer. Number of temporary modules to identify and use in cell
splitting. Only used if yInit = NULL. Collapsing features to a
relatively smaller number of modules will increase the speed of clustering
and tend to produce better cell populations. This number should be larger
than the number of true modules expected in the dataset. Default
NULL.
Integer vector. Module labels for features. Cells will be
clustered using the celda_CG model based on the modules specified in
yInit rather than the counts of individual features. While the
features will be initialized to the module labels in yInit, the
labels will be allowed to move within each new model with a different K.
Numeric. Concentration parameter for Theta. Adds a pseudocount
to each cell population in each sample. Default 1.
Numeric. Concentration parameter for Phi. Adds a pseudocount to
each feature in each cell (if yInit is NULL) or to each module in
each cell population (if yInit is set). Default 1.
Numeric. Concentration parameter for Psi. Adds a pseudocount
to each feature in each module. Only used if yInit is set. Default 1.
Numeric. Concentration parameter for Eta. Adds a pseudocount
to the number of features in each module. Only used if yInit is set.
Default 1.
Integer. Only attempt to split cell populations with at least this many cells.
Logical. Whether to reorder cell populations using hierarchical clustering after each model has been created. If FALSE, cell populations numbers will correspond to the split which created the cell populations (i.e. 'K15' was created at split 15, 'K16' was created at split 16, etc.). Default TRUE.
Integer. Passed to with_seed. For reproducibility, a default value of 12345 is used. If NULL, no calls to with_seed are made.
Logical. Whether to calculate perplexity for each model. If FALSE, then perplexity can be calculated later with resamplePerplexity. Default TRUE.
Boolean. If TRUE, then each cell in the counts
matrix will be resampled according to a multinomial distribution to introduce
noise before calculating perplexity. Default FALSE.
Integer. The number of times to resample the counts matrix
for evaluating perplexity if doResampling is set to TRUE.
Default 5.
Character. Messages will be redirected to a file named "logfile". If NULL, messages will be printed to stdout. Default NULL.
Logical. Whether to print log messages. Default TRUE.
A SingleCellExperiment object. Function parameter settings and celda model results are stored in the
metadata
"celda_grid_search" slot. The models in
the list will be of class celda_C if yInit = NULL or
celda_CG if zInit is set.
recursiveSplitModule for recursive splitting of feature modules.
data(sceCeldaCG)
## Create models that range from K = 3 to K = 7 by recursively splitting
## cell populations into two to produce \link{celda_C} cell clustering models
sce <- recursiveSplitCell(sceCeldaCG, initialK = 3, maxK = 7)
#> ==================================================
#> Starting recursive cell population splitting.
#> ==================================================
#> Tue Apr 2 18:56:32 2024 .. Initializing with 3 populations
#> Tue Apr 2 18:56:32 2024 .. Current cell population 4 | logLik: -1225755.01101897
#> Tue Apr 2 18:56:32 2024 .. Current cell population 5 | logLik: -1213677.60126784
#> Tue Apr 2 18:56:32 2024 .. Current cell population 6 | logLik: -1213903.59449854
#> Tue Apr 2 18:56:32 2024 .. Current cell population 7 | logLik: -1214081.54311397
#> Tue Apr 2 18:56:32 2024 .. Calculating perplexity
#> ==================================================
#> Completed recursive cell population splitting. Total time: 0.3127432 secs
#> ==================================================
## Alternatively, first identify features modules using
## \link{recursiveSplitModule}
moduleSplit <- recursiveSplitModule(sceCeldaCG, initialL = 3, maxL = 15)
#> ==================================================
#> Starting recursive module splitting.
#> ==================================================
#> Tue Apr 2 18:56:32 2024 .. Collapsing to 100 temporary cell populations
#> Tue Apr 2 18:56:34 2024 .. Initializing with 3 modules
#> Tue Apr 2 18:56:35 2024 .. Created module 4 | logLik: -1241379.90928455
#> Tue Apr 2 18:56:35 2024 .. Created module 5 | logLik: -1235212.7977535
#> Tue Apr 2 18:56:35 2024 .. Created module 6 | logLik: -1232789.9817561
#> Tue Apr 2 18:56:35 2024 .. Created module 7 | logLik: -1227246.66090571
#> Tue Apr 2 18:56:35 2024 .. Created module 8 | logLik: -1223898.757694
#> Tue Apr 2 18:56:35 2024 .. Created module 9 | logLik: -1221848.26936098
#> Tue Apr 2 18:56:35 2024 .. Created module 10 | logLik: -1220147.96681948
#> Tue Apr 2 18:56:35 2024 .. Created module 11 | logLik: -1220818.37022325
#> Tue Apr 2 18:56:35 2024 .. Created module 12 | logLik: -1221489.07685946
#> Tue Apr 2 18:56:35 2024 .. Created module 13 | logLik: -1222032.53497571
#> Tue Apr 2 18:56:36 2024 .. Created module 14 | logLik: -1222712.17543857
#> Tue Apr 2 18:56:36 2024 .. Created module 15 | logLik: -1223268.97596756
#> Tue Apr 2 18:56:36 2024 .. Calculating perplexity
#> ==================================================
#> Completed recursive module splitting. Total time: 3.285614 secs
#> ==================================================
plotGridSearchPerplexity(moduleSplit)
moduleSplitSelect <- subsetCeldaList(moduleSplit, list(L = 10))
## Then use module labels for initialization in \link{recursiveSplitCell} to
## produce \link{celda_CG} bi-clustering models
cellSplit <- recursiveSplitCell(sceCeldaCG,
initialK = 3, maxK = 7, yInit = celdaModules(moduleSplitSelect))
#> ==================================================
#> Starting recursive cell population splitting.
#> ==================================================
#> Tue Apr 2 18:56:36 2024 .. Collapsing to 10 modules
#> Tue Apr 2 18:56:36 2024 .. Initializing with 3 populations
#> Tue Apr 2 18:56:37 2024 .. Current cell population 4 | logLik: -1225286.49558716
#> Tue Apr 2 18:56:37 2024 .. Current cell population 5 | logLik: -1212955.15575681
#> Tue Apr 2 18:56:37 2024 .. Current cell population 6 | logLik: -1212982.74290613
#> Tue Apr 2 18:56:37 2024 .. Current cell population 7 | logLik: -1213005.40337891
#> Tue Apr 2 18:56:37 2024 .. Calculating perplexity
#> ==================================================
#> Completed recursive cell population splitting. Total time: 1.227239 secs
#> ==================================================
plotGridSearchPerplexity(cellSplit)
sce <- subsetCeldaList(cellSplit, list(K = 5, L = 10))
data(celdaCGSim, celdaCSim)
## Create models that range from K = 3 to K = 7 by recursively splitting
## cell populations into two to produce \link{celda_C} cell clustering models
sce <- recursiveSplitCell(celdaCSim$counts, initialK = 3, maxK = 7)
#> ==================================================
#> Starting recursive cell population splitting.
#> ==================================================
#> Tue Apr 2 18:56:38 2024 .. Initializing with 3 populations
#> Tue Apr 2 18:56:38 2024 .. Current cell population 4 | logLik: -1341630.1679001
#> Tue Apr 2 18:56:38 2024 .. Current cell population 5 | logLik: -1327506.91718317
#> Tue Apr 2 18:56:38 2024 .. Current cell population 6 | logLik: -1315227.54586167
#> Tue Apr 2 18:56:38 2024 .. Current cell population 7 | logLik: -1304393.65802293
#> Tue Apr 2 18:56:38 2024 .. Calculating perplexity
#> ==================================================
#> Completed recursive cell population splitting. Total time: 0.2604418 secs
#> ==================================================
## Alternatively, first identify features modules using
## \link{recursiveSplitModule}
moduleSplit <- recursiveSplitModule(celdaCGSim$counts,
initialL = 3, maxL = 15)
#> ==================================================
#> Starting recursive module splitting.
#> ==================================================
#> Tue Apr 2 18:56:39 2024 .. Collapsing to 100 temporary cell populations
#> Tue Apr 2 18:56:40 2024 .. Initializing with 3 modules
#> Tue Apr 2 18:56:40 2024 .. Created module 4 | logLik: -1243396.62348886
#> Tue Apr 2 18:56:40 2024 .. Created module 5 | logLik: -1237610.11790137
#> Tue Apr 2 18:56:40 2024 .. Created module 6 | logLik: -1232128.87013396
#> Tue Apr 2 18:56:40 2024 .. Created module 7 | logLik: -1227611.8250329
#> Tue Apr 2 18:56:40 2024 .. Created module 8 | logLik: -1225618.06184004
#> Tue Apr 2 18:56:40 2024 .. Created module 9 | logLik: -1223967.77531912
#> Tue Apr 2 18:56:41 2024 .. Created module 10 | logLik: -1222801.11395987
#> Tue Apr 2 18:56:41 2024 .. Created module 11 | logLik: -1223402.66903597
#> Tue Apr 2 18:56:41 2024 .. Created module 12 | logLik: -1224026.19892208
#> Tue Apr 2 18:56:41 2024 .. Created module 13 | logLik: -1224675.63005464
#> Tue Apr 2 18:56:41 2024 .. Created module 14 | logLik: -1225317.91966369
#> Tue Apr 2 18:56:41 2024 .. Created module 15 | logLik: -1225971.50555157
#> Tue Apr 2 18:56:41 2024 .. Calculating perplexity
#> ==================================================
#> Completed recursive module splitting. Total time: 2.560257 secs
#> ==================================================
plotGridSearchPerplexity(moduleSplit)
moduleSplitSelect <- subsetCeldaList(moduleSplit, list(L = 10))
## Then use module labels for initialization in \link{recursiveSplitCell} to
## produce \link{celda_CG} bi-clustering models
cellSplit <- recursiveSplitCell(celdaCGSim$counts,
initialK = 3, maxK = 7, yInit = celdaModules(moduleSplitSelect))
#> ==================================================
#> Starting recursive cell population splitting.
#> ==================================================
#> Tue Apr 2 18:56:42 2024 .. Collapsing to 10 modules
#> Tue Apr 2 18:56:42 2024 .. Initializing with 3 populations
#> Tue Apr 2 18:56:44 2024 .. Current cell population 4 | logLik: -1227944.5458832
#> Tue Apr 2 18:56:44 2024 .. Current cell population 5 | logLik: -1215605.08613503
#> Tue Apr 2 18:56:44 2024 .. Current cell population 6 | logLik: -1215627.62281773
#> Tue Apr 2 18:56:44 2024 .. Current cell population 7 | logLik: -1215651.32538066
#> Tue Apr 2 18:56:44 2024 .. Calculating perplexity
#> ==================================================
#> Completed recursive cell population splitting. Total time: 2.064909 secs
#> ==================================================
plotGridSearchPerplexity(cellSplit)
sce <- subsetCeldaList(cellSplit, list(K = 5, L = 10))
Uses the celda_G model to cluster features into modules for a range of possible L's. The module labels of the previous "L-1" model are used as the initial values in the current model with L modules. The best split of an existing module is found to create the L-th module. This procedure is much faster than randomly initializing each model with a different L.
recursiveSplitModule(
x,
useAssay = "counts",
altExpName = "featureSubset",
initialL = 10,
maxL = 100,
tempK = 100,
zInit = NULL,
sampleLabel = NULL,
alpha = 1,
beta = 1,
delta = 1,
gamma = 1,
minFeature = 3,
reorder = TRUE,
seed = 12345,
perplexity = TRUE,
doResampling = FALSE,
numResample = 5,
verbose = TRUE,
logfile = NULL
)
# S4 method for SingleCellExperiment
recursiveSplitModule(
x,
useAssay = "counts",
altExpName = "featureSubset",
initialL = 10,
maxL = 100,
tempK = 100,
zInit = NULL,
sampleLabel = NULL,
alpha = 1,
beta = 1,
delta = 1,
gamma = 1,
minFeature = 3,
reorder = TRUE,
seed = 12345,
perplexity = TRUE,
doResampling = FALSE,
numResample = 5,
verbose = TRUE,
logfile = NULL
)
# S4 method for matrix
recursiveSplitModule(
x,
useAssay = "counts",
altExpName = "featureSubset",
initialL = 10,
maxL = 100,
tempK = 100,
zInit = NULL,
sampleLabel = NULL,
alpha = 1,
beta = 1,
delta = 1,
gamma = 1,
minFeature = 3,
reorder = TRUE,
seed = 12345,
perplexity = TRUE,
doResampling = FALSE,
numResample = 5,
verbose = TRUE,
logfile = NULL
)A numeric matrix of counts or a
SingleCellExperiment
with the matrix located in the assay slot under useAssay.
Rows represent features and columns represent cells.
A string specifying which assay
slot to use if x is a
SingleCellExperiment object. Default "counts".
The name for the altExp slot to use. Default "featureSubset".
Integer. Initial number of modules.
Integer. Maximum number of modules.
Integer. Number of temporary cell populations to identify and
use in module splitting. Only used if zInit = NULL Collapsing cells
to a relatively smaller number of cell popluations will increase the
speed of module clustering and tend to produce better modules. This number
should be larger than the number of true cell populations expected in the
dataset. Default 100.
Integer vector. Collapse cells to cell populations based on
labels in zInit and then perform module splitting. If NULL, no
collapsing will be performed unless tempK is specified.
Default NULL.
Vector or factor. Denotes the sample label for each cell
(column) in the count matrix. Default NULL.
Numeric. Concentration parameter for Theta. Adds a pseudocount
to each cell population in each sample. Only used if zInit is set.
Default 1.
Numeric. Concentration parameter for Phi. Adds a pseudocount to each feature module in each cell. Default 1.
Numeric. Concentration parameter for Psi. Adds a pseudocount to each feature in each module. Default 1.
Numeric. Concentration parameter for Eta. Adds a pseudocount to the number of features in each module. Default 1.
Integer. Only attempt to split modules with at least this many features.
Logical. Whether to reorder modules using hierarchical clustering after each model has been created. If FALSE, module numbers will correspond to the split which created the module (i.e. 'L15' was created at split 15, 'L16' was created at split 16, etc.). Default TRUE.
Integer. Passed to with_seed. For reproducibility, a default value of 12345 is used. If NULL, no calls to with_seed are made.
Logical. Whether to calculate perplexity for each model.
If FALSE, then perplexity can be calculated later with
resamplePerplexity. Default TRUE.
Boolean. If TRUE, then each cell in the counts
matrix will be resampled according to a multinomial distribution to introduce
noise before calculating perplexity. Default FALSE.
Integer. The number of times to resample the counts matrix
for evaluating perplexity if doResampling is set to TRUE.
Default 5.
Logical. Whether to print log messages. Default TRUE.
Character. Messages will be redirected to a file named "logfile". If NULL, messages will be printed to stdout. Default NULL.
A SingleCellExperiment object. Function parameter settings and celda model results are stored in the
metadata
"celda_grid_search" slot. The models in
the list will be of class celda_G if zInit = NULL or
celda_CG if zInit is set.
recursiveSplitCell for recursive splitting of cell
populations.
data(sceCeldaCG)
## Create models that range from L=3 to L=20 by recursively splitting modules
## into two
moduleSplit <- recursiveSplitModule(sceCeldaCG, initialL = 3, maxL = 20)
#> ==================================================
#> Starting recursive module splitting.
#> ==================================================
#> Tue Apr 2 18:56:46 2024 .. Collapsing to 100 temporary cell populations
#> Tue Apr 2 18:56:47 2024 .. Initializing with 3 modules
#> Tue Apr 2 18:56:47 2024 .. Created module 4 | logLik: -1241379.90928455
#> Tue Apr 2 18:56:47 2024 .. Created module 5 | logLik: -1235212.7977535
#> Tue Apr 2 18:56:47 2024 .. Created module 6 | logLik: -1232789.9817561
#> Tue Apr 2 18:56:48 2024 .. Created module 7 | logLik: -1227246.66090571
#> Tue Apr 2 18:56:48 2024 .. Created module 8 | logLik: -1223898.757694
#> Tue Apr 2 18:56:48 2024 .. Created module 9 | logLik: -1221848.26936098
#> Tue Apr 2 18:56:48 2024 .. Created module 10 | logLik: -1220147.96681948
#> Tue Apr 2 18:56:48 2024 .. Created module 11 | logLik: -1220818.37022325
#> Tue Apr 2 18:56:48 2024 .. Created module 12 | logLik: -1221489.07685946
#> Tue Apr 2 18:56:48 2024 .. Created module 13 | logLik: -1222032.53497571
#> Tue Apr 2 18:56:48 2024 .. Created module 14 | logLik: -1222712.17543857
#> Tue Apr 2 18:56:48 2024 .. Created module 15 | logLik: -1223268.97596756
#> Tue Apr 2 18:56:48 2024 .. Created module 16 | logLik: -1223841.4834406
#> Tue Apr 2 18:56:49 2024 .. Created module 17 | logLik: -1224394.02513994
#> Tue Apr 2 18:56:49 2024 .. Created module 18 | logLik: -1224863.41435811
#> Tue Apr 2 18:56:49 2024 .. Created module 19 | logLik: -1225480.30453125
#> Tue Apr 2 18:56:49 2024 .. Created module 20 | logLik: -1226156.47078695
#> Tue Apr 2 18:56:49 2024 .. Calculating perplexity
#> ==================================================
#> Completed recursive module splitting. Total time: 3.40894 secs
#> ==================================================
## Example results with perplexity
plotGridSearchPerplexity(moduleSplit)
## Select model for downstream analysis
celdaMod <- subsetCeldaList(moduleSplit, list(L = 10))
data(celdaCGSim)
## Create models that range from L=3 to L=20 by recursively splitting modules
## into two
moduleSplit <- recursiveSplitModule(celdaCGSim$counts,
initialL = 3, maxL = 20)
#> ==================================================
#> Starting recursive module splitting.
#> ==================================================
#> Tue Apr 2 18:56:50 2024 .. Collapsing to 100 temporary cell populations
#> Tue Apr 2 18:56:51 2024 .. Initializing with 3 modules
#> Tue Apr 2 18:56:51 2024 .. Created module 4 | logLik: -1243396.62348886
#> Tue Apr 2 18:56:51 2024 .. Created module 5 | logLik: -1237610.11790137
#> Tue Apr 2 18:56:51 2024 .. Created module 6 | logLik: -1232128.87013396
#> Tue Apr 2 18:56:51 2024 .. Created module 7 | logLik: -1227611.8250329
#> Tue Apr 2 18:56:51 2024 .. Created module 8 | logLik: -1225618.06184004
#> Tue Apr 2 18:56:52 2024 .. Created module 9 | logLik: -1223967.77531912
#> Tue Apr 2 18:56:52 2024 .. Created module 10 | logLik: -1222801.11395987
#> Tue Apr 2 18:56:52 2024 .. Created module 11 | logLik: -1223402.66903597
#> Tue Apr 2 18:56:52 2024 .. Created module 12 | logLik: -1224026.19892208
#> Tue Apr 2 18:56:52 2024 .. Created module 13 | logLik: -1224675.63005464
#> Tue Apr 2 18:56:52 2024 .. Created module 14 | logLik: -1225317.91966369
#> Tue Apr 2 18:56:52 2024 .. Created module 15 | logLik: -1225971.50555157
#> Tue Apr 2 18:56:52 2024 .. Created module 16 | logLik: -1226557.7881506
#> Tue Apr 2 18:56:52 2024 .. Created module 17 | logLik: -1227080.13473523
#> Tue Apr 2 18:56:53 2024 .. Created module 18 | logLik: -1227603.99622355
#> Tue Apr 2 18:56:53 2024 .. Created module 19 | logLik: -1228247.84169741
#> Tue Apr 2 18:56:53 2024 .. Created module 20 | logLik: -1228828.70617002
#> Tue Apr 2 18:56:53 2024 .. Calculating perplexity
#> ==================================================
#> Completed recursive module splitting. Total time: 3.238301 secs
#> ==================================================
## Example results with perplexity
plotGridSearchPerplexity(moduleSplit)
## Select model for downstream analysis
celdaMod <- subsetCeldaList(moduleSplit, list(L = 10))
R/reorderCelda.R
reorderCelda.RdApply hierarchical clustering to reorder the cell populations and/or feature modules and group similar ones together based on the cosine distance of the factorized matrix from factorizeMatrix.
reorderCelda(
x,
celdaMod,
useAssay = "counts",
altExpName = "featureSubset",
method = "complete"
)
# S4 method for SingleCellExperiment,ANY
reorderCelda(
x,
useAssay = "counts",
altExpName = "featureSubset",
method = "complete"
)
# S4 method for matrix,celda_CG
reorderCelda(x, celdaMod, method = "complete")
# S4 method for matrix,celda_C
reorderCelda(x, celdaMod, method = "complete")
# S4 method for matrix,celda_G
reorderCelda(x, celdaMod, method = "complete")Can be one of
A SingleCellExperiment object returned by
celda_C, celda_G or celda_CG, with the matrix
located in the useAssay assay slot in altExp(x, altExpName).
Rows represent features and columns represent cells.
Integer count matrix. Rows represent features and columns represent
cells. This matrix should be the same as the one used to generate
celdaMod.
Celda model object. Only works if x is an integer
counts matrix. Ignored if x is a
SingleCellExperiment object.
A string specifying which assay
slot to use if x is a SingleCellExperiment object.
Default "counts".
The name for the altExp slot. Default "featureSubset".
Passed to hclust. The agglomeration method to be used to be used. Default "complete".
A SingleCellExperiment object (or Celda model object) with updated cell cluster and/or feature module labels.
data(sceCeldaCG)
reordersce <- reorderCelda(sceCeldaCG)
#> Cluster labels are converted to factors.
#> Module labels are converted to factors.
data(celdaCGSim, celdaCGMod)
reorderCeldaCG <- reorderCelda(celdaCGSim$counts, celdaCGMod)
data(celdaCSim, celdaCMod)
reorderCeldaC <- reorderCelda(celdaCSim$counts, celdaCMod)
data(celdaGSim, celdaGMod)
reorderCeldaG <- reorderCelda(celdaGSim$counts, celdaGMod)
reportCeldaCGRun will run recursiveSplitModule and
recursiveSplitCell to find the number of modules (L) and the
number of cell populations (K). A final celda_CG model will
be selected from recursiveSplitCell. After a celda_CG model
has been fit, reportCeldaCGPlotResults can be used to create an HTML
report for visualization and exploration of the celda_CG model
results. Some of the plotting and feature selection functions require the
installation of the Bioconductor package singleCellTK.
reportCeldaCGRun(
sce,
L,
K,
sampleLabel = NULL,
altExpName = "featureSubset",
useAssay = "counts",
initialL = 10,
maxL = 150,
initialK = 5,
maxK = 50,
minCell = 3,
minCount = 3,
maxFeatures = 5000,
output_file = "CeldaCG_RunReport",
output_sce_prefix = "celda_cg",
output_dir = ".",
pdf = FALSE,
showSession = TRUE
)
reportCeldaCGPlotResults(
sce,
reducedDimName,
features = NULL,
displayName = NULL,
altExpName = "featureSubset",
useAssay = "counts",
cellAnnot = NULL,
cellAnnotLabel = NULL,
exactMatch = TRUE,
moduleFilePrefix = "module_features",
output_file = "CeldaCG_ResultReport",
output_dir = ".",
pdf = FALSE,
showSetup = TRUE,
showSession = TRUE
)A SingleCellExperiment with the matrix located in
the assay slot under useAssay. Rows represent features and columns
represent cells.
Integer. Final number of feature modules. See celda_CG for
more information.
Integer. Final number of cell populations. See celda_CG for
more information.
Vector or factor. Denotes the sample label for each cell (column) in the count matrix.
The name for the altExp slot to use. Default
"featureSubset".
A string specifying which assay slot to use. Default
"counts".
Integer. Minimum number of modules to try. See
recursiveSplitModule for more information. Defailt 10.
Integer. Maximum number of modules to try. See
recursiveSplitModule for more information. Default 150.
Integer. Initial number of cell populations to try.
Integer. Maximum number of cell populations to try.
Integer. Minimum number of cells required for feature
selection. See selectFeatures for more information. Default
3.
Integer. Minimum number of counts required for feature
selection. See selectFeatures for more information. Default
3.
Integer. Maximum number of features to include. If the
number of features after filtering for minCell and minCount
are greater than maxFeature, then Seurat's VST function is used to
select the top variable features. Default 5000.
Character. Prefix of the html file. Default
"CeldaCG_ResultReport".
Character. The sce object with
celda_CG results will be saved to an .rds file starting with
this prefix. Default celda_cg.
Character. Path to save the html file. Default ..
Boolean. Whether to create PDF versions of each plot in addition
to PNGs. Default FALSE.
Boolean. Whether to show the session information at the
end. Default TRUE.
Character. Name of the reduced dimensional object to be
used in 2-D scatter plots throughout the report. Default celda_UMAP.
Character vector. Expression of these features will be
displayed on a reduced dimensional plot defined by reducedDimName.
If NULL, then no plotting of features on a reduced dimensinoal plot
will be performed. Default NULL.
Character. The name to use for display in scatter plots
and heatmaps. If NULL, then the rownames of the sce object
will be used. This can also be set to the name of a column in the row data
of sce or altExp(sce, altExpName). Default NULL.
Character vector. The cell-level annotations to display on
the reduced dimensional plot. These variables should be present in the
column data of the sce object. Default NULL.
Character vector. Additional cell-level annotations
to display on the reduced dimensional plot. Variables will be treated
as categorial and labels for each group will be placed on the plot.
These variables should be present in the column data of the sce
object. Default NULL.
Boolean. Whether to only identify exact matches or to
identify partial matches using grep. Default FALSE.
Character. The features in each module will be
written to a a csv file starting with this name. If NULL, then no
file will be written. Default "module_features".
Boolean. Whether to show the setup code at the beginning.
Default TRUE.
.html file
data(sceCeldaCG)
if (FALSE) {
library(SingleCellExperiment)
sceCeldaCG$sum <- colSums(counts(sceCeldaCG))
rowData(sceCeldaCG)$rownames <- rownames(sceCeldaCG)
sceCeldaCG <- reportCeldaCGRun(sceCeldaCG,
initialL = 5, maxL = 20, initialK = 5,
maxK = 20, L = 10, K = 5)
reportCeldaCGPlotResults(sce = sceCeldaCG,
reducedDimName = "celda_UMAP",
features = c("Gene_1", "Gene_100"),
displayName = "rownames",
cellAnnot="sum")
}
Returns all celda models generated during a celdaGridSearch run.
resList(x, altExpName = "featureSubset")
# S4 method for SingleCellExperiment
resList(x, altExpName = "featureSubset")
# S4 method for celdaList
resList(x)An object of class SingleCellExperiment or
celdaList.
The name for the altExp slot to use. Default "featureSubset".
List. Contains one celdaModel object for each of the parameters
specified in runParams(x).
R/perplexity.R
resamplePerplexity.RdCalculates the perplexity of each model's cluster assignments given the provided countMatrix, as well as resamplings of that count matrix, providing a distribution of perplexities and a better sense of the quality of a given K/L choice.
resamplePerplexity(
x,
celdaList,
useAssay = "counts",
altExpName = "featureSubset",
doResampling = FALSE,
numResample = 5,
seed = 12345
)
# S4 method for SingleCellExperiment
resamplePerplexity(
x,
useAssay = "counts",
altExpName = "featureSubset",
doResampling = FALSE,
numResample = 5,
seed = 12345
)
# S4 method for ANY
resamplePerplexity(
x,
celdaList,
doResampling = FALSE,
numResample = 5,
seed = 12345
)A numeric matrix of counts or a
SingleCellExperiment returned from celdaGridSearch
with the matrix located in the assay slot under useAssay.
Rows represent features and columns represent cells. Must contain
"celda_grid_search" slot in metadata(x) if x is a
SingleCellExperiment object.
Object of class 'celdaList'. Used only if x is a
matrix object.
A string specifying which assay
slot to use if x is a
SingleCellExperiment object. Default "counts".
The name for the altExp slot to use. Default "featureSubset".
Boolean. If TRUE, then each cell in the counts
matrix will be resampled according to a multinomial distribution to introduce
noise before calculating perplexity. Default FALSE.
Integer. The number of times to resample the counts matrix
for evaluating perplexity if doResampling is set to TRUE.
Default 5.
Integer. Passed to with_seed. For reproducibility,
a default value of 12345 is used. If NULL, no calls to
with_seed are made.
A SingleCellExperiment object or
celdaList object with a perplexity
property, detailing the perplexity of all K/L combinations that appeared in the celdaList's models.
data(sceCeldaCGGridSearch)
sce <- resamplePerplexity(sceCeldaCGGridSearch)
plotGridSearchPerplexity(sce)
data(celdaCGSim, celdaCGGridSearchRes)
celdaCGGridSearchRes <- resamplePerplexity(
celdaCGSim$counts,
celdaCGGridSearchRes
)
plotGridSearchPerplexity(celdaCGGridSearchRes)
This will return indices of features among the rownames
or rowData of a data.frame, matrix, or a SummarizedExperiment
object including a SingleCellExperiment.
Partial matching (i.e. grepping) can be used by setting
exactMatch = FALSE.
retrieveFeatureIndex(
features,
x,
by = "rownames",
exactMatch = TRUE,
removeNA = FALSE
)Character vector of feature names to find in the rows of
x.
A data.frame, matrix, or SingleCellExperiment object to search.
Character. Where to search for features in x. If set to
"rownames" then the features will be searched for among
rownames(x). If x inherits from class
SummarizedExperiment, then by can be one of the
fields in the row annotation data.frame (i.e. one of
colnames(rowData(x))).
Boolean. Whether to only identify exact matches
or to identify partial matches using grep.
Boolean. If set to FALSE, features not found in
x will be given NA and the returned vector will be the same
length as features. If set to TRUE, then the NA
values will be removed from the returned vector. Default FALSE.
A vector of row indices for the matching features in x.
'retrieveFeatureInfo' from package 'scater'
and link{regex} for how to use regular expressions when
exactMatch = FALSE.
data(celdaCGSim)
retrieveFeatureIndex(c("Gene_1", "Gene_5"), celdaCGSim$counts)
#> [1] 1 5
retrieveFeatureIndex(c("Gene_1", "Gene_5"), celdaCGSim$counts,
exactMatch = FALSE)
#> Warning: Feature 'Gene_1' matched multiple items in 'rownames': Gene_1,Gene_10,Gene_11,Gene_12,Gene_13,Gene_14,Gene_15,Gene_16,Gene_17,Gene_18,Gene_19,Gene_100. Only the first match will be selected.
#> Warning: Feature 'Gene_5' matched multiple items in 'rownames': Gene_5,Gene_50,Gene_51,Gene_52,Gene_53,Gene_54,Gene_55,Gene_56,Gene_57,Gene_58,Gene_59. Only the first match will be selected.
#> [1] 1 5
SingleCellExperiment or celdaList objectR/accessors.R
runParams.RdReturns details on the clustering parameters and model priors from the celdaList object when it was created.
runParams(x, altExpName = "featureSubset")
# S4 method for SingleCellExperiment
runParams(x, altExpName = "featureSubset")
# S4 method for celdaList
runParams(x)An object of class SingleCellExperiment or class
celdaList.
The name for the altExp slot to use. Default "featureSubset".
Data Frame. Contains details on the various K/L parameters, chain parameters, seed, and final log-likelihoods derived for each model in the provided celdaList.
data(sceCeldaCGGridSearch)
runParams(sceCeldaCGGridSearch)
#> index chain K L seed logLikelihood mean_perplexity
#> 1 1 1 4 9 12345 -1226407 46.95862
#> 2 2 1 5 9 12345 -1214714 45.25311
#> 3 3 1 6 9 12345 -1214754 45.25124
#> 4 4 1 4 10 12345 -1225233 46.77100
#> 5 5 1 5 10 12345 -1212891 44.98559
#> 6 6 1 6 10 12345 -1212928 44.98491
#> 7 7 1 4 11 12345 -1225255 46.77053
#> 8 8 1 5 11 12345 -1212917 44.98514
#> 9 9 1 6 11 12345 -1212953 44.98495
data(celdaCGGridSearchRes)
runParams(celdaCGGridSearchRes)
#> index chain K L logLikelihood mean_perplexity
#> 1 1 1 4 9 -1228381 47.12902
#> 2 2 1 5 9 -1217364 45.51741
#> 3 3 1 6 9 -1217407 45.51451
#> 4 4 1 4 10 -1227891 47.04897
#> 5 5 1 5 10 -1215541 45.24781
#> 6 6 1 6 10 -1215583 45.24439
#> 7 7 1 4 11 -1227913 47.04849
#> 8 8 1 5 11 -1215567 45.24733
#> 9 9 1 6 11 -1215619 45.24380
A matrix of simulated gene counts.
sampleCellsA matrix of simulated gene counts with 10 rows (genes) and 10 columns (cells).
A toy count matrix for use with celda.
Generated by Josh Campbell.
R/accessors.R
sampleLabel.RdReturn or set the sample labels for the cells in sce.
sampleLabel(x, altExpName = "featureSubset")
# S4 method for SingleCellExperiment
sampleLabel(x, altExpName = "featureSubset")
sampleLabel(x, altExpName = "featureSubset") <- value
# S4 method for SingleCellExperiment
sampleLabel(x, altExpName = "featureSubset") <- value
# S4 method for celdaModel
sampleLabel(x)Can be one of
A SingleCellExperiment object returned by
celda_C, celda_G, or celda_CG, with the matrix
located in the useAssay assay slot.
Rows represent features and columns represent cells.
A celda model object.
The name for the altExp slot to use. Default "featureSubset".
Character vector of sample labels for replacements. Works
only is x is a SingleCellExperiment object.
Character vector. Contains the sample labels provided at model creation, or those automatically generated by celda.
data(sceCeldaCG)
sampleLabel(sceCeldaCG)
#> [1] Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1
#> [9] Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1
#> [17] Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1
#> [25] Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1
#> [33] Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1
#> [41] Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1
#> [49] Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1
#> [57] Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1
#> [65] Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1
#> [73] Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1
#> [81] Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1
#> [89] Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_2 Sample_2 Sample_2
#> [97] Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2
#> [105] Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2
#> [113] Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2
#> [121] Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2
#> [129] Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2
#> [137] Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2
#> [145] Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2
#> [153] Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2
#> [161] Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2
#> [169] Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2
#> [177] Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2
#> [185] Sample_2 Sample_2 Sample_2 Sample_2 Sample_3 Sample_3 Sample_3 Sample_3
#> [193] Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3
#> [201] Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3
#> [209] Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3
#> [217] Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3
#> [225] Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3
#> [233] Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3
#> [241] Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3
#> [249] Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3
#> [257] Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3
#> [265] Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3
#> [273] Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3
#> [281] Sample_3 Sample_3 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4
#> [289] Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4
#> [297] Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4
#> [305] Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4
#> [313] Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4
#> [321] Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4
#> [329] Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4
#> [337] Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4
#> [345] Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_5 Sample_5 Sample_5
#> [353] Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5
#> [361] Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5
#> [369] Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5
#> [377] Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5
#> [385] Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5
#> [393] Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5
#> [401] Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5
#> [409] Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5
#> [417] Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5
#> [425] Sample_5
#> Levels: Sample_1 Sample_2 Sample_3 Sample_4 Sample_5
data(celdaCGMod)
sampleLabel(celdaCGMod)
#> [1] Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1
#> [9] Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1
#> [17] Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1
#> [25] Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1
#> [33] Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1
#> [41] Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1
#> [49] Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1
#> [57] Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1
#> [65] Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1
#> [73] Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1
#> [81] Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_1
#> [89] Sample_1 Sample_1 Sample_1 Sample_1 Sample_1 Sample_2 Sample_2 Sample_2
#> [97] Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2
#> [105] Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2
#> [113] Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2
#> [121] Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2
#> [129] Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2
#> [137] Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2
#> [145] Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2
#> [153] Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2
#> [161] Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2
#> [169] Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2
#> [177] Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2 Sample_2
#> [185] Sample_2 Sample_2 Sample_2 Sample_2 Sample_3 Sample_3 Sample_3 Sample_3
#> [193] Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3
#> [201] Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3
#> [209] Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3
#> [217] Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3
#> [225] Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3
#> [233] Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3
#> [241] Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3
#> [249] Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3
#> [257] Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3
#> [265] Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3
#> [273] Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3 Sample_3
#> [281] Sample_3 Sample_3 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4
#> [289] Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4
#> [297] Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4
#> [305] Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4
#> [313] Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4
#> [321] Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4
#> [329] Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4
#> [337] Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_4
#> [345] Sample_4 Sample_4 Sample_4 Sample_4 Sample_4 Sample_5 Sample_5 Sample_5
#> [353] Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5
#> [361] Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5
#> [369] Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5
#> [377] Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5
#> [385] Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5
#> [393] Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5
#> [401] Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5
#> [409] Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5
#> [417] Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5 Sample_5
#> [425] Sample_5
#> Levels: Sample_1 Sample_2 Sample_3 Sample_4 Sample_5
A SingleCellExperiment object containing the results of running selectFeatures and celda_C on celdaCSim.
sceCeldaCA SingleCellExperiment object
data(celdaCSim)
sceCeldaC <- selectFeatures(celdaCSim$counts)
sceCeldaC <- celda_C(sceCeldaC,
K = celdaCSim$K,
sampleLabel = celdaCSim$sampleLabel,
nchains = 1)
#> --------------------------------------------------
#> Starting Celda_C: Clustering cells.
#> --------------------------------------------------
#> Tue Apr 2 18:57:00 2024 .. Initializing 'z' in chain 1 with 'split'
#> Tue Apr 2 18:57:00 2024 .... Completed iteration: 1 | logLik: -1282027.27277705
#> Tue Apr 2 18:57:00 2024 .... Completed iteration: 2 | logLik: -1282027.27277705
#> Tue Apr 2 18:57:00 2024 .. Finished chain 1
#> --------------------------------------------------
#> Completed Celda_C. Total time: 0.08842993 secs
#> --------------------------------------------------
A SingleCellExperiment object containing the results of running selectFeatures and celda_CG on celdaCGSim.
sceCeldaCGA SingleCellExperiment object
data(celdaCGSim)
sceCeldaCG <- selectFeatures(celdaCGSim$counts)
sceCeldaCG <- celda_CG(sceCeldaCG,
K = celdaCGSim$K,
L = celdaCGSim$L,
sampleLabel = celdaCGSim$sampleLabel,
nchains = 1)
#> --------------------------------------------------
#> Starting Celda_CG: Clustering cells and genes.
#> --------------------------------------------------
#> Tue Apr 2 18:57:01 2024 .. Initializing 'z' in chain 1 with 'split'
#> Tue Apr 2 18:57:01 2024 .. Initializing 'y' in chain 1 with 'split'
#> Tue Apr 2 18:57:04 2024 .... Completed iteration: 1 | logLik: -1212891.16546068
#> Tue Apr 2 18:57:04 2024 .... Completed iteration: 2 | logLik: -1212891.16546068
#> Tue Apr 2 18:57:04 2024 .... Completed iteration: 3 | logLik: -1212891.16546068
#> Tue Apr 2 18:57:04 2024 .... Completed iteration: 4 | logLik: -1212891.16546068
#> Tue Apr 2 18:57:04 2024 .... Completed iteration: 5 | logLik: -1212891.16546068
#> Tue Apr 2 18:57:04 2024 .... Completed iteration: 6 | logLik: -1212891.16546068
#> Tue Apr 2 18:57:04 2024 .... Completed iteration: 7 | logLik: -1212891.16546068
#> Tue Apr 2 18:57:04 2024 .... Completed iteration: 8 | logLik: -1212891.16546068
#> Tue Apr 2 18:57:04 2024 .... Completed iteration: 9 | logLik: -1212891.16546068
#> Tue Apr 2 18:57:04 2024 .... Determining if any gene clusters should be split.
#> Tue Apr 2 18:57:04 2024 .... No additional splitting was performed.
#> Tue Apr 2 18:57:04 2024 .... Determining if any cell clusters should be split.
#> Tue Apr 2 18:57:04 2024 .... No additional splitting was performed.
#> Tue Apr 2 18:57:04 2024 .... Completed iteration: 10 | logLik: -1212891.16546068
#> Tue Apr 2 18:57:04 2024 .... Determining if any cell clusters should be split.
#> Tue Apr 2 18:57:04 2024 .... No additional splitting was performed.
#> Tue Apr 2 18:57:04 2024 .... Completed iteration: 11 | logLik: -1212891.16546068
#> Tue Apr 2 18:57:04 2024 .. Finished chain 1
#> --------------------------------------------------
#> Completed Celda_CG. Total time: 3.333782 secs
#> --------------------------------------------------
A SingleCellExperiment object containing the results of running selectFeatures and celdaGridSearch on celdaCGSim.
sceCeldaCGGridSearchA SingleCellExperiment object
data(celdaCGSim)
sce <- selectFeatures(celdaCGSim$counts)
sceCeldaCGGridSearch <- celdaGridSearch(sce,
model = "celda_CG",
paramsTest = list(K = seq(4, 6), L = seq(9, 11)),
paramsFixed = list(sampleLabel = celdaCGSim$sampleLabel),
bestOnly = TRUE,
nchains = 1,
cores = 1,
verbose = FALSE)
A SingleCellExperiment object containing the results of running selectFeatures and celda_G on celdaGSim.
sceCeldaGA SingleCellExperiment object
data(celdaGSim)
sceCeldaG <- selectFeatures(celdaGSim$counts)
sceCeldaG <- celda_G(sceCeldaG, L = celdaGSim$L, nchains = 1)
#> --------------------------------------------------
#> Starting Celda_G: Clustering genes.
#> --------------------------------------------------
#> Tue Apr 2 18:57:47 2024 .. Initializing 'y' in chain 1 with 'split'
#> Tue Apr 2 18:57:47 2024 .... Completed iteration: 1 | logLik: -289209.476865707
#> Tue Apr 2 18:57:47 2024 .... Completed iteration: 2 | logLik: -289209.476865707
#> Tue Apr 2 18:57:47 2024 .... Completed iteration: 3 | logLik: -289209.476865707
#> Tue Apr 2 18:57:47 2024 .... Completed iteration: 4 | logLik: -289209.476865707
#> Tue Apr 2 18:57:47 2024 .... Completed iteration: 5 | logLik: -289209.476865707
#> Tue Apr 2 18:57:47 2024 .... Completed iteration: 6 | logLik: -289209.476865707
#> Tue Apr 2 18:57:47 2024 .... Completed iteration: 7 | logLik: -289209.476865707
#> Tue Apr 2 18:57:47 2024 .... Completed iteration: 8 | logLik: -289209.476865707
#> Tue Apr 2 18:57:47 2024 .... Completed iteration: 9 | logLik: -289209.476865707
#> Tue Apr 2 18:57:47 2024 .... Determining if any gene clusters should be split.
#> Tue Apr 2 18:57:47 2024 .... No additional splitting was performed.
#> Tue Apr 2 18:57:47 2024 .... Completed iteration: 10 | logLik: -289209.476865707
#> Tue Apr 2 18:57:47 2024 .... Completed iteration: 11 | logLik: -289209.476865707
#> Tue Apr 2 18:57:47 2024 .. Finished chain 1
#> --------------------------------------------------
#> Completed Celda_G. Total time: 0.4345939 secs
#> --------------------------------------------------
R/celdaGridSearch.R
selectBestModel.RdSelect the chain with the best log likelihood for each
combination of tested parameters from a SCE object gererated by
celdaGridSearch or from a celdaList object.
selectBestModel(x, asList = FALSE, altExpName = "featureSubset")
# S4 method for SingleCellExperiment
selectBestModel(x, asList = FALSE, altExpName = "featureSubset")
# S4 method for celdaList
selectBestModel(x, asList = FALSE)Can be one of
A SingleCellExperiment object returned from
celdaGridSearch, recursiveSplitModule,
or recursiveSplitCell. Must contain a list named
"celda_grid_search" in metadata(x).
celdaList object.
TRUE or FALSE. Whether to return the
best model as a
celdaList object or not. If FALSE, return the best model as a
corresponding celda model object.
The name for the altExp slot to use. Default "featureSubset".
One of
A new SingleCellExperiment object containing
one model with the best log-likelihood for each set of parameters in
metadata(x). If there is only one set of parameters,
a new SingleCellExperiment object
with the matching model stored in the
metadata
"celda_parameters" slot will be returned. Otherwise, a new
SingleCellExperiment object with the subset models stored
in the metadata
"celda_grid_search" slot will be returned.
A new celdaList object containing one model with the best
log-likelihood for each set of parameters. If only one set of parameters
is in the celdaList, the best model will be returned directly
instead of a celdaList object.
A simple heuristic feature selection procedure.
Select features with at least minCount counts
in at least minCell cells. A SingleCellExperiment
object with subset features will be stored in the
altExp slot with name altExpName.
The name of the assay slot in altExp
will be the same as useAssay.
selectFeatures(
x,
minCount = 3,
minCell = 3,
useAssay = "counts",
altExpName = "featureSubset"
)
# S4 method for SingleCellExperiment
selectFeatures(
x,
minCount = 3,
minCell = 3,
useAssay = "counts",
altExpName = "featureSubset"
)
# S4 method for matrix
selectFeatures(
x,
minCount = 3,
minCell = 3,
useAssay = "counts",
altExpName = "featureSubset"
)A numeric matrix of counts or a
SingleCellExperiment
with the matrix located in the assay slot under useAssay.
Rows represent features and columns represent cells.
Minimum number of counts required for feature selection.
Minimum number of cells required for feature selection.
A string specifying the name of the assay slot to use. Default "counts".
The name for the altExp slot to use. Default "featureSubset".
A SingleCellExperiment object with a
altExpName
altExp slot. Function parameter settings are stored in the metadata
"select_features" slot.
A function to draw clustered heatmaps where one has better control over some graphical parameters such as cell size, etc.
The function also allows to aggregate the rows using kmeans clustering. This is advisable if number of rows is so big that R cannot handle their hierarchical clustering anymore, roughly more than 1000. Instead of showing all the rows separately one can cluster the rows in advance and show only the cluster centers. The number of clusters can be tuned with parameter kmeansK.
semiPheatmap(
mat,
color = colorRampPalette(rev(brewer.pal(n = 7, name = "RdYlBu")))(100),
kmeansK = NA,
breaks = NA,
borderColor = "grey60",
cellWidth = NA,
cellHeight = NA,
scale = "none",
clusterRows = TRUE,
clusterCols = TRUE,
clusteringDistanceRows = "euclidean",
clusteringDistanceCols = "euclidean",
clusteringMethod = "complete",
clusteringCallback = .identity2,
cutreeRows = NA,
cutreeCols = NA,
treeHeightRow = ifelse(clusterRows, 50, 0),
treeHeightCol = ifelse(clusterCols, 50, 0),
legend = TRUE,
legendBreaks = NA,
legendLabels = NA,
annotationRow = NA,
annotationCol = NA,
annotation = NA,
annotationColors = NA,
annotationLegend = TRUE,
annotationNamesRow = TRUE,
annotationNamesCol = TRUE,
dropLevels = TRUE,
showRownames = TRUE,
showColnames = TRUE,
main = NA,
fontSize = 10,
fontSizeRow = fontSize,
fontSizeCol = fontSize,
displayNumbers = FALSE,
numberFormat = "%.2f",
numberColor = "grey30",
fontSizeNumber = 0.8 * fontSize,
gapsRow = NULL,
gapsCol = NULL,
labelsRow = NULL,
labelsCol = NULL,
fileName = NA,
width = NA,
height = NA,
silent = FALSE,
rowLabel,
colLabel,
rowGroupOrder = NULL,
colGroupOrder = NULL,
...
)numeric matrix of the values to be plotted.
vector of colors used in heatmap.
the number of kmeans clusters to make, if we want to agggregate the rows before drawing heatmap. If NA then the rows are not aggregated.
Numeric vector. A sequence of numbers that covers the range of values in the normalized `counts`. Values in the normalized `matrix` are assigned to each bin in `breaks`. Each break is assigned to a unique color from `col`. If NULL, then breaks are calculated automatically. Default NULL.
color of cell borders on heatmap, use NA if no border should be drawn.
individual cell width in points. If left as NA, then the values depend on the size of plotting window.
individual cell height in points. If left as NA, then the values depend on the size of plotting window.
character indicating if the values should be centered and
scaled in either the row direction or the column direction, or none.
Corresponding values are "row", "column" and "none".
boolean values determining if rows should be clustered or
hclust object,
boolean values determining if columns should be clustered
or hclust object.
distance measure used in clustering rows.
Possible values are "correlation" for Pearson correlation and all
the distances supported by dist, such as "euclidean",
etc. If the value is none of the above it is assumed that a distance matrix
is provided.
distance measure used in clustering columns. Possible values the same as for clusteringDistanceRows.
clustering method used. Accepts the same values as
hclust.
callback function to modify the clustering. Is
called with two parameters: original hclust object and the matrix
used for clustering. Must return a hclust object.
number of clusters the rows are divided into, based on the hierarchical clustering (using cutree), if rows are not clustered, the argument is ignored
similar to cutreeRows, but for columns
the height of a tree for rows, if these are clustered. Default value 50 points.
the height of a tree for columns, if these are clustered. Default value 50 points.
logical to determine if legend should be drawn or not.
vector of breakpoints for the legend.
vector of labels for the legendBreaks.
data frame that specifies the annotations shown on left side of the heatmap. Each row defines the features for a specific row. The rows in the data and in the annotation are matched using corresponding row names. Note that color schemes takes into account if variable is continuous or discrete.
similar to annotationRow, but for columns.
deprecated parameter that currently sets the annotationCol if it is missing.
list for specifying annotationRow and annotationCol track colors manually. It is possible to define the colors for only some of the features. Check examples for details.
boolean value showing if the legend for annotation tracks should be drawn.
boolean value showing if the names for row annotation tracks should be drawn.
boolean value showing if the names for column annotation tracks should be drawn.
logical to determine if unused levels are also shown in the legend.
boolean specifying if column names are be shown.
boolean specifying if column names are be shown.
the title of the plot
base fontsize for the plot
fontsize for rownames (Default: fontsize)
fontsize for colnames (Default: fontsize)
logical determining if the numeric values are also printed to the cells. If this is a matrix (with same dimensions as original matrix), the contents of the matrix are shown instead of original values.
format strings (C printf style) of the numbers shown in
cells. For example "%.2f" shows 2 decimal places and "%.1e"
shows exponential notation (see more in sprintf).
color of the text
fontsize of the numbers displayed in cells
vector of row indices that show shere to put gaps into
heatmap. Used only if the rows are not clustered. See cutreeRow
to see how to introduce gaps to clustered rows.
similar to gapsRow, but for columns.
custom labels for rows that are used instead of rownames.
similar to labelsRow, but for columns.
file path where to save the picture. Filetype is decided by the extension in the path. Currently following formats are supported: png, pdf, tiff, bmp, jpeg. Even if the plot does not fit into the plotting window, the file size is calculated so that the plot would fit there, unless specified otherwise.
manual option for determining the output file width in inches.
manual option for determining the output file height in inches.
do not draw the plot (useful when using the gtable output)
row cluster labels for semi-clustering
column cluster labels for semi-clustering
Vector. Specifies the order of feature clusters when
semisupervised clustering is performed on the y labels.
Vector. Specifies the order of cell clusters when
semisupervised clustering is performed on the z labels.
graphical parameters for the text used in plot. Parameters
passed to grid.text, see gpar.
This function generates a SingleCellExperiment
containing a simulated counts matrix in the "counts" assay slot, as
well as various parameters used in the simulation which can be
useful for running celda and are stored in metadata slot. The user
must provide the desired model (one of celda_C, celda_G, celda_CG) as well
as any desired tuning parameters for those model's simulation functions
as detailed below.
Character. Options available in celda::availableModels.
Can be one of "celda_CG", "celda_C", or "celda_G".
Default "celda_CG".
Integer. Number of samples to simulate. Default 5. Only used if
model is one of "celda_CG" or "celda_C".
Integer vector. A vector of length 2 that specifies the lower
and upper bounds of the number of cells to be generated in each sample.
Default c(50, 100). Only used if
model is one of "celda_CG" or "celda_C".
Integer vector. A vector of length 2 that specifies the lower and upper bounds of the number of counts generated for each cell. Default c(500, 1000).
Integer. Number of cells to simulate. Default 100. Only used if
model is "celda_G".
Integer. The total number of features to be simulated. Default 100.
Integer. Number of cell populations. Default 5. Only used if
model is one of "celda_CG" or "celda_C".
Integer. Number of feature modules. Default 10. Only used if
model is one of "celda_CG" or "celda_G".
Numeric. Concentration parameter for Theta. Adds a pseudocount
to each cell population in each sample. Default 1. Only used if
model is one of "celda_CG" or "celda_C".
Numeric. Concentration parameter for Phi. Adds a pseudocount to each feature module in each cell population. Default 1.
Numeric. Concentration parameter for Eta. Adds a pseudocount to
the number of features in each module. Default 5. Only used if
model is one of "celda_CG" or "celda_G".
Numeric. Concentration parameter for Psi. Adds a pseudocount to
each feature in each module. Default 1. Only used if
model is one of "celda_CG" or "celda_G".
Integer. Passed to with_seed. For reproducibility, a default value of 12345 is used. If NULL, no calls to with_seed are made.
A SingleCellExperiment object with simulated count matrix stored in the "counts" assay slot. Function parameter settings are stored in the metadata slot. For
"celda_CG" and "celda_C" models,
columns celda_sample_label and celda_cell_cluster in
colData contain simulated sample labels and
cell population clusters. For "celda_CG" and "celda_G"
models, column celda_feature_module in
rowData contains simulated gene modules.
sce <- simulateCells()
This function generates a list containing two count matrices -- one for real expression, the other one for contamination, as well as other parameters used in the simulation which can be useful for running decontamination.
Integer. Number of cells to be simulated. Default 300.
Integer. Number of genes to be simulated. Default 100.
Integer. Number of cell populations to be simulated.
Default 3.
Integer vector. A vector of length 2 that specifies the lower
and upper bounds of the number of counts generated for each cell. Default
c(500, 1000).
Numeric. Concentration parameter for Phi. Default 0.1.
Numeric or Numeric vector. Concentration parameter for Theta.
If input as a single numeric value, symmetric values for beta
distribution are specified; if input as a vector of lenght 2, the two
values will be the shape1 and shape2 paramters of the beta distribution
respectively. Default c(1, 5).
Integer. Number of markers for each cell population.
Default 3.
Integer. Passed to with_seed.
For reproducibility, a default value of 12345 is used. If NULL, no calls to
with_seed are made.
A list containing the nativeMatirx (real expression),
observedMatrix (real expression + contamination), as well as other
parameters used in the simulation.
contaminationSim <- simulateContamination(K = 3, delta = c(1, 10))
Manually select a celda feature module to split into 2 or more modules. Useful for splitting up modules that show divergent expression of features in multiple cell clusters.
splitModule(
x,
module,
useAssay = "counts",
altExpName = "featureSubset",
n = 2,
seed = 12345
)
# S4 method for SingleCellExperiment
splitModule(
x,
module,
useAssay = "counts",
altExpName = "featureSubset",
n = 2,
seed = 12345
)A SingleCellExperiment object
with the matrix located in the assay slot under useAssay.
Rows represent features and columns represent cells.
Integer. The module to be split.
A string specifying which assay
slot to use for x. Default "counts".
The name for the altExp slot
to use. Default "featureSubset".
Integer. How many modules should module be split into.
Default 2.
Integer. Passed to with_seed. For reproducibility, a default value of 12345 is used. If NULL, no calls to with_seed are made.
A updated SingleCellExperiment object with new
feature modules stored in column celda_feature_module in
rowData(x).
data(sceCeldaCG)
# Split module 5 into 2 new modules.
sce <- splitModule(sceCeldaCG, module = 5)
celdaGridSearchR/celdaGridSearch.R
subsetCeldaList.RdSelect a subset of models from a
SingleCellExperiment object generated by
celdaGridSearch that match the criteria in the argument
params.
subsetCeldaList(x, params, altExpName = "featureSubset")
# S4 method for SingleCellExperiment
subsetCeldaList(x, params, altExpName = "featureSubset")
# S4 method for celdaList
subsetCeldaList(x, params)Can be one of
A SingleCellExperiment object returned from
celdaGridSearch, recursiveSplitModule,
or recursiveSplitCell. Must contain a list named
"celda_grid_search" in metadata(x).
celdaList object.
List. List of parameters used to subset the matching celda
models in list "celda_grid_search" in metadata(x).
The name for the altExp slot to use. Default "featureSubset".
One of
A new SingleCellExperiment object containing
all models matching the
provided criteria in params. If only one celda model result in the
"celda_grid_search" slot in metadata(x) matches
the given criteria, a new SingleCellExperiment object
with the matching model stored in the
metadata
"celda_parameters" slot will be returned. Otherwise, a new
SingleCellExperiment object with the subset models stored
in the metadata
"celda_grid_search" slot will be returned.
A new celdaList object containing all models matching the
provided criteria in params. If only one item in the
celdaList matches the given criteria, the matching model will be
returned directly instead of a celdaList object.
celdaGridSearch can run Celda with multiple parameters and chains in parallel. selectBestModel can get the best model for each combination of parameters.
topRank() can quickly identify the top `n` rows for each column of a matrix. For example, this can be useful for identifying the top `n` features per cell.
topRank(matrix, n = 25, margin = 2, threshold = 0, decreasing = TRUE)Numeric matrix.
Integer. Maximum number of items above `threshold` returned for each ranked row or column.
Integer. Dimension of `matrix` to rank, with 1 for rows, 2 for columns. Default 2.
Numeric. Only return ranked rows or columns in the matrix that are above this threshold. If NULL, then no threshold will be applied. Default 0.
Logical. Specifies if the rank should be decreasing. Default TRUE.
List. The `index` variable provides the top `n` row (feature) indices contributing the most to each column (cell). The `names` variable provides the rownames corresponding to these indexes.
data(sampleCells)
topRanksPerCell <- topRank(sampleCells, n = 5)
topFeatureNamesForCell <- topRanksPerCell$names[1]