Sample n rows from a table — sample_n • tidySummarizedExperiment

sample_n() and sample_frac() have been superseded in favour of slice_sample(). While they will not be deprecated in the near future, retirement means that we will only perform critical bug fixes, so we recommend moving to the newer alternative.

These functions were superseded because we realised it was more convenient to have two mutually exclusive arguments to one function, rather than two separate functions. This also made it to clean up a few other smaller design issues with sample_n()/sample_frac:

The connection to slice() was not obvious.
The name of the first argument, tbl, is inconsistent with other single table verbs which use .data.
The size argument uses tidy evaluation, which is surprising and undocumented.
It was easier to remove the deprecated .env argument.
... was in a suboptimal position.

# S3 method for class 'SummarizedExperiment'
sample_n(tbl, size, replace = FALSE, weight = NULL, .env = NULL, ...)

# S3 method for class 'SummarizedExperiment'
sample_frac(tbl, size = 1, replace = FALSE, weight = NULL, .env = NULL, ...)

Arguments

tbl: A data.frame.
size: <tidy-select> For sample_n(), the number of rows to select. For sample_frac(), the fraction of rows to select. If tbl is grouped, size applies to each group.
replace: Sample with or without replacement?
weight: <tidy-select> Sampling weights. This must evaluate to a vector of non-negative numbers the same length as the input. Weights are automatically standardised to sum to 1.
.env: DEPRECATED.
...: ignored

Value

tidySummarizedExperiment

References

Hutchison, W.J., Keyes, T.J., The tidyomics Consortium. et al. The tidyomics ecosystem: enhancing omic data analyses. Nat Methods 21, 1166–1170 (2024). https://doi.org/10.1038/s41592-024-02299-2

Wickham, H., François, R., Henry, L., Müller, K., Vaughan, D. (2023). dplyr: A Grammar of Data Manipulation. R package version 2.1.4, https://CRAN.R-project.org/package=dplyr

Hutchison, W.J., Keyes, T.J., The tidyomics Consortium. et al. The tidyomics ecosystem: enhancing omic data analyses. Nat Methods 21, 1166–1170 (2024). https://doi.org/10.1038/s41592-024-02299-2

Wickham, H., François, R., Henry, L., Müller, K., Vaughan, D. (2023). dplyr: A Grammar of Data Manipulation. R package version 2.1.4, https://CRAN.R-project.org/package=dplyr

Examples

data(pasilla)
pasilla |> sample_n(50)
#> tidySummarizedExperiment says: A data frame is returned for independent data analysis.
#> # A tibble: 50 × 5
#>    .feature    .sample counts condition type      
#>    <chr>       <chr>    <int> <chr>     <chr>     
#>  1 FBgn0043796 untrt2     630 untreated single_end
#>  2 FBgn0033810 untrt2     389 untreated single_end
#>  3 FBgn0053742 trt3         0 treated   paired_end
#>  4 FBgn0035552 untrt4       0 untreated paired_end
#>  5 FBgn0037449 untrt2      37 untreated single_end
#>  6 FBgn0029811 untrt2       0 untreated single_end
#>  7 FBgn0035743 untrt2       0 untreated single_end
#>  8 FBgn0053362 untrt1       0 untreated single_end
#>  9 FBgn0034231 untrt3      42 untreated paired_end
#> 10 FBgn0041709 trt2         6 treated   paired_end
#> # ℹ 40 more rows
pasilla |> sample_frac(0.1)
#> tidySummarizedExperiment says: A data frame is returned for independent data analysis.
#> # A tibble: 10,219 × 5
#>    .feature    .sample counts condition type      
#>    <chr>       <chr>    <int> <chr>     <chr>     
#>  1 FBgn0033100 untrt1     212 untreated single_end
#>  2 FBgn0052494 trt1         0 treated   single_end
#>  3 FBgn0035113 untrt1      76 untreated single_end
#>  4 FBgn0014010 untrt2    5936 untreated single_end
#>  5 FBgn0041096 untrt1      52 untreated single_end
#>  6 FBgn0034434 untrt2     304 untreated single_end
#>  7 FBgn0034565 untrt1       1 untreated single_end
#>  8 FBgn0034936 untrt4     547 untreated paired_end
#>  9 FBgn0035331 trt1        13 treated   single_end
#> 10 FBgn0021967 untrt1     988 untreated single_end
#> # ℹ 10,209 more rows