Keep variable transcripts — keep

keep_variable() takes as input A `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment)) and returns a consistent object (to the input) with additional columns for the statistics from the hypothesis test.

keep_variable(
  .data,
  .abundance = NULL,
  top = 500,
  transform = log1p,
  log_transform = TRUE
)

# S4 method for class 'SummarizedExperiment'
keep_variable(.data, top = 500, transform = log1p)

# S4 method for class 'RangedSummarizedExperiment'
keep_variable(.data, top = 500, transform = log1p)

Arguments

.data: A `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment))
.abundance: The name of the transcript/gene abundance column
top: Integer. Number of top transcript to consider
transform: A function that will tranform the counts, by default it is log1p for RNA sequencing data, but for avoinding tranformation you can use identity
log_transform: DEPRECATED. Use transform instead.

Value

A consistent object (to the input) with additional columns for the statistics from the hypothesis test (e.g., log fold change, p-value and false discovery rate).

Underlying method: s <- rowMeans((x - rowMeans(x)) ^ 2) o <- order(s, decreasing = TRUE) x <- x[o[1L:top], , drop = FALSE] variable_trancripts = rownames(x)

A `SummarizedExperiment` object

Details

`r lifecycle::badge("maturing")`

At the moment this function uses edgeR https://doi.org/10.1093/bioinformatics/btp616

References

Mangiola, S., Molania, R., Dong, R., Doyle, M. A., & Papenfuss, A. T. (2021). tidybulk: an R tidy framework for modular transcriptomic data analysis. Genome Biology, 22(1), 42. doi:10.1186/s13059-020-02233-7

Examples

## Load airway dataset for examples

  data('airway', package = 'airway')
  # Ensure a 'condition' column exists for examples expecting it

    SummarizedExperiment::colData(airway)$condition <- SummarizedExperiment::colData(airway)$dex





  keep_variable(airway, top = 500)
#> Warning: tidybulk says: highly abundant transcripts were not identified (i.e. identify_abundant()) or filtered (i.e., keep_abundant), therefore this operation will be performed on unfiltered data. In rare occasions this could be wanted. In standard whole-transcriptome workflows is generally unwanted.
#> Getting the 500 most variable genes
#> class: RangedSummarizedExperiment 
#> dim: 500 8 
#> metadata(1): ''
#> assays(1): counts
#> rownames(500): ENSG00000129824 ENSG00000229807 ... ENSG00000205592
#>   ENSG00000243232
#> rowData names(10): gene_id gene_name ... seq_coord_system symbol
#> colnames(8): SRR1039508 SRR1039509 ... SRR1039520 SRR1039521
#> colData names(10): SampleName cell ... BioSample condition