R/tidyr_methods.R
extract.Rdextract() has been superseded in favour of separate_wider_regex()
because it has a more polished API and better handling of problems.
Superseded functions will not go away, but will only receive critical bug
fixes.
Given a regular expression with capturing groups, extract() turns
each group into a new column. If the groups don't match, or the input
is NA, the output will be NA.
# S3 method for class 'SingleCellExperiment'
extract(
data,
col,
into,
regex = "([[:alnum:]]+)",
remove = TRUE,
convert = FALSE,
...
)A data frame.
<tidy-select> Column to expand.
Names of new variables to create as character vector.
Use NA to omit the variable in the output.
A string representing a regular expression used to extract the
desired values. There should be one group (defined by ()) for each
element of into.
If TRUE, remove input column from output data frame.
If TRUE, will run type.convert() with
as.is = TRUE on new columns. This is useful if the component
columns are integer, numeric or logical.
NB: this will cause string "NA"s to be converted to NAs.
Additional arguments passed on to methods.
`tidySingleCellExperiment`
Hutchison, W.J., Keyes, T.J., The tidyomics Consortium. et al. The tidyomics ecosystem: enhancing omic data analyses. Nat Methods 21, 1166–1170 (2024). https://doi.org/10.1038/s41592-024-02299-2
Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, et al. Welcome to the tidyverse. Journal of Open Source Software. 2019;4(43):1686. https://doi.org/10.21105/joss.01686
separate() to split up by a separator.
data(pbmc_small)
pbmc_small |>
extract(groups,
into="g",
regex="g([0-9])",
convert=TRUE)
#> # A SingleCellExperiment-tibble abstraction: 80 × 17
#> # Features=230 | Cells=80 | Assays=counts, logcounts
#> .cell orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents g
#> <chr> <fct> <dbl> <int> <fct> <fct> <int>
#> 1 ATGCC… SeuratPro… 70 47 0 A 2
#> 2 CATGG… SeuratPro… 85 52 0 A 1
#> 3 GAACC… SeuratPro… 87 50 1 B 2
#> 4 TGACT… SeuratPro… 127 56 0 A 2
#> 5 AGTCA… SeuratPro… 173 53 0 A 2
#> 6 TCTGA… SeuratPro… 70 48 0 A 1
#> 7 TGGTA… SeuratPro… 64 36 0 A 1
#> 8 GCAGC… SeuratPro… 72 45 0 A 1
#> 9 GATAT… SeuratPro… 52 36 0 A 1
#> 10 AATGT… SeuratPro… 100 41 0 A 1
#> # ℹ 70 more rows
#> # ℹ 10 more variables: RNA_snn_res.1 <fct>, file <chr>, ident <fct>,
#> # PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>, PC_4 <dbl>, PC_5 <dbl>, tSNE_1 <dbl>,
#> # tSNE_2 <dbl>