Group by one or more variables

Most data operations are done on groups defined by variables. group_by() takes an existing tbl and converts it into a grouped tbl where operations are performed "by group". ungroup() removes grouping.

# S3 method for class 'SummarizedExperiment'
group_by(.data, ..., .add = FALSE, .drop = group_by_drop_default(.data))

Arguments

.data

A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details.

...

In group_by(), variables or computations to group by. Computations are always done on the ungrouped data frame. To perform computations on the grouped data, you need to use a separate mutate() step before the group_by(). Computations are not allowed in nest_by(). In ungroup(), variables to remove from the grouping.

.add

When FALSE, the default, group_by() will override existing groups. To add to the existing groups, use .add = TRUE.

This argument was previously called add, but that prevented creating a new grouping variable called add, and conflicts with our naming conventions.

.drop

Drop groups formed by factor levels that don't appear in the data? The default is TRUE except when .data has been previously grouped with .drop = FALSE. See group_by_drop_default() for details.

Value

A grouped data frame with class grouped_df, unless the combination of ... and add yields a empty set of grouping columns, in which case a tibble will be returned.

Methods

These function are generics, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.

Methods available in currently loaded packages:

group_by(): dplyr (data.frame), plotly (plotly), tidySummarizedExperiment (SummarizedExperiment) .
ungroup(): dplyr (data.frame, grouped_df, rowwise_df), plotly (plotly) .

Ordering

Currently, group_by() internally orders the groups in ascending order. This results in ordered output from functions that aggregate groups, such as summarise().

When used as grouping columns, character vectors are ordered in the C locale for performance and reproducibility across R sessions. If the resulting ordering of your grouped operation matters and is dependent on the locale, you should follow up the grouped operation with an explicit call to arrange() and set the .locale argument. For example:

data %>%
  group_by(chr) %>%
  summarise(avg = mean(x)) %>%
  arrange(chr, .locale = "en")

This is often useful as a preliminary step before generating content intended for humans, such as an HTML table.

Legacy behavior

Prior to dplyr 1.1.0, character vector grouping columns were ordered in the system locale. If you need to temporarily revert to this behavior, you can set the global option dplyr.legacy_locale to TRUE, but this should be used sparingly and you should expect this option to be removed in a future version of dplyr. It is better to update existing code to explicitly call arrange(.locale = ) instead. Note that setting dplyr.legacy_locale will also force calls to arrange() to use the system locale.

References

Hutchison, W.J., Keyes, T.J., The tidyomics Consortium. et al. The tidyomics ecosystem: enhancing omic data analyses. Nat Methods 21, 1166–1170 (2024). https://doi.org/10.1038/s41592-024-02299-2

Wickham, H., François, R., Henry, L., Müller, K., Vaughan, D. (2023). dplyr: A Grammar of Data Manipulation. R package version 2.1.4, https://CRAN.R-project.org/package=dplyr

Examples

data(pasilla)
pasilla  |> group_by(.sample)
#> tidySummarizedExperiment says: A data frame is returned for independent data analysis.
#> # A tibble: 102,193 × 5
#> # Groups:   .sample [7]
#>    .feature    .sample counts condition type      
#>    <chr>       <chr>    <int> <chr>     <chr>     
#>  1 FBgn0000003 untrt1       0 untreated single_end
#>  2 FBgn0000008 untrt1      92 untreated single_end
#>  3 FBgn0000014 untrt1       5 untreated single_end
#>  4 FBgn0000015 untrt1       0 untreated single_end
#>  5 FBgn0000017 untrt1    4664 untreated single_end
#>  6 FBgn0000018 untrt1     583 untreated single_end
#>  7 FBgn0000022 untrt1       0 untreated single_end
#>  8 FBgn0000024 untrt1      10 untreated single_end
#>  9 FBgn0000028 untrt1       0 untreated single_end
#> 10 FBgn0000032 untrt1    1446 untreated single_end
#> # ℹ 102,183 more rows