plyranges provides a consistent interface for importing and wrangling genomics data from a variety of sources. The package defines a grammar of genomic data transformation based on dplyr and the Bioconductor packages IRanges, GenomicRanges, and rtracklayer. It does this by providing a set of verbs for developing analysis pipelines based on GRanges objects that represent genomic regions:
- Modify genomic regions with the
mutate()andstretch()functions. - Modify genomic regions while fixing the start/end/center coordinates with the
anchor_family of functions. - Sort genomic ranges with
arrange(). - Modify, subset, and aggregate genomic data with the
mutate(),filter(), andsummarise()functions. - Any of the above operations can be performed on partitions of the data with
group_by(). - Find nearest neighbour genomic regions with the
join_nearest_family of functions. - Find overlaps between ranges with the
join_overlaps_family of functions. - Add additional metadata between ranges and a table with the
join_mcols_family of functions. - Merge all overlapping and adjacent genomic regions with
reduce_ranges(). - Merge the end points of all genomic regions with
disjoin_ranges(). - Import and write common genomic data formats with the
read_/write_family of functions.
Documentation
For more details on the features of plyranges, read the introductory vignette and the examples vignette.
For a complete case-study on using plyranges to combine ATAC-seq and RNA-seq results read the fluentGenomics workflow.
plyranges is part of the tidyomics project, providing a dplyr-based interface for many types of genomics datasets represented in Bioconductor.
Installation
plyranges can be installed from the latest Bioconductor release:
# install.packages("BiocManager")
BiocManager::install("plyranges")To install the development version from GitHub:
BiocManager::install("tidyomics/plyranges")Learning more
In addition to the two package vignettes, see the following for more informtion:
The fluentGenomics workflow package shows how to combine differential gene expression and differential chromatin accessibility using plyranges.
The extended vignette in the plyrangesWorkshops package has a detailed walk through of using plyranges for coverage analysis.
The collection of genomic range applications including plyranges: tidy ranges tutorial.
Citation
If you found plyranges useful for your work please cite our paper:
@ARTICLE{Lee2019,
title = "plyranges: a grammar of genomic data transformation",
author = "Lee, Stuart and Cook, Dianne and Lawrence, Michael",
journal = "Genome Biol.",
volume = 20,
number = 1,
pages = "4",
month = jan,
year = 2019,
url = "http://dx.doi.org/10.1186/s13059-018-1597-8",
doi = "10.1186/s13059-018-1597-8",
pmc = "PMC6320618"
}Contributing
We welcome contributions from the R/Bioconductor community. We ask that contributors follow the code of conduct and the guide outlined here.
