R/reduce_dimensions.R
reduce_dimensions-methods.Rdreduce_dimensions() takes as input a `SummarizedExperiment` and calculates the reduced dimensional space of the transcript abundance.
reduce_dimensions(
.data,
assay,
method,
.dims = 2,
top = 500,
of_samples = TRUE,
transform = log1p,
scale = TRUE,
...,
log_transform = NULL
)
# S4 method for class 'SummarizedExperiment'
reduce_dimensions(
.data,
assay,
method,
.dims = 2,
top = 500,
of_samples = TRUE,
transform = log1p,
scale = TRUE,
...,
log_transform = NULL
)
# S4 method for class 'RangedSummarizedExperiment'
reduce_dimensions(
.data,
assay,
method,
.dims = 2,
top = 500,
of_samples = TRUE,
transform = log1p,
scale = TRUE,
...,
log_transform = NULL
)A `SummarizedExperiment`
Character string: the name of the assay to use for dimension reduction (must match `assayNames(.data)`). This argument must be explicitly specified so that the choice of abundance matrix is deliberate.
A character string. The dimension reduction algorithm to use (PCA, MDS, tSNE).
An integer. The number of dimensions your are interested in (e.g., 4 for returning the first four principal components).
An integer. How many top genes to select for dimensionality reduction
A boolean. In case the input is a tidybulk object, it indicates Whether the element column will be sample or transcript column
A function that will tranform the counts, by default it is log1p for RNA sequencing data, but for avoinding tranformation you can use identity
A boolean for method="PCA", this will be passed to the `prcomp` function. It is not included in the ... argument because although the default for `prcomp` if FALSE, it is advisable to set it as TRUE.
Further parameters passed to the function prcomp if you choose method="PCA" or Rtsne if you choose method="tSNE", or uwot::tumap if you choose method="umap"
DEPRECATED - A boolean, whether the value should be log-transformed (e.g., TRUE for RNA sequencing data)
A tbl object with additional columns for the reduced dimensions
A `SummarizedExperiment` object
A `SummarizedExperiment` object
`r lifecycle::badge("maturing")`
This function reduces the dimensions of the transcript abundances. It can use multi-dimensional scaling (MDS; DOI.org/10.1186/gb-2010-11-3-r25), principal component analysis (PCA), or tSNE (Jesse Krijthe et al. 2018)
Underlying method for PCA: prcomp(scale = scale, ...)
Underlying method for MDS: limma::plotMDS(ndim = .dims, plot = FALSE, top = top)
Underlying method for tSNE: Rtsne::Rtsne(data, ...)
Underlying method for UMAP: variable features from the chosen assay are optionally PCA-reduced, then uwot::tumap() is applied to the sample coordinates matrix.
Mangiola, S., Molania, R., Dong, R., Doyle, M. A., & Papenfuss, A. T. (2021). tidybulk: an R tidy framework for modular transcriptomic data analysis. Genome Biology, 22(1), 42. doi:10.1186/s13059-020-02233-7
Krijthe, J. H. (2015). Rtsne: T-Distributed Stochastic Neighbor Embedding using a Barnes-Hut Implementation. R package version 0.15.
McInnes, L., Healy, J., & Melville, J. (2018). UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv preprint arXiv:1802.03426.
## Load airway dataset for examples
data('airway', package = 'airway')
# Ensure a 'condition' column exists for examples expecting it
SummarizedExperiment::colData(airway)$condition <- SummarizedExperiment::colData(airway)$dex
counts.MDS =
airway |>
identify_abundant() |>
reduce_dimensions(assay = "counts", method="MDS", .dims = 3)
#> Warning: All samples appear to belong to the same group.
#> Getting the 500 most variable genes
#> [1] "MDS result_df colnames: sample, 1, 2, 3"
#> tidybulk says: to access the raw results do `metadata(.)$tidybulk$MDS`
counts.PCA =
airway |>
identify_abundant() |>
reduce_dimensions(assay = "counts", method="PCA", .dims = 3)
#> Warning: All samples appear to belong to the same group.
#> Getting the 500 most variable genes
#> Fraction of variance explained by the selected principal components
#> # A tibble: 3 × 2
#> `Fraction of variance` PC
#> <dbl> <int>
#> 1 0.0650 1
#> 2 0.0453 2
#> 3 0.0233 3
#> tidybulk says: to access the raw results do `metadata(.)$tidybulk$PCA`