R/ranges-overlap-filter.R
ranges-filter-overlaps.Rd
Filter by overlapping/non-overlapping ranges
filter_by_overlaps(x, y, maxgap = -1L, minoverlap = 0L)
filter_by_non_overlaps(x, y, maxgap, minoverlap)
filter_by_overlaps_directed(x, y, maxgap = -1L, minoverlap = 0L)
filter_by_non_overlaps_directed(x, y, maxgap, minoverlap)
Objects representing ranges
The maximimum gap between intervals as a single
integer greater than or equal to -1. If you modify this argument,
minoverlap
must be held fixed.
The minimum amount of overlap between intervals
as a single integer greater than 0. If you modify this argument,
maxgap
must be held fixed.
a Ranges object
By default, filter_by_overlaps
and
filter_by_non_overlaps
ignore strandedness for GRanges()
objects. To perform stranded operations use filter_by_overlaps_directed
and filter_by_non_overlaps_directed
. The argument maxgap
is the maximum number of positions
between two ranges for them to be considered overlapping. Here the default
is set to be -1 as that is the the gap between two ranges that
has its start or end strictly inside the other. The argugment
minoverlap
refers to the minimum number of positions
overlapping between ranges, to consider there to be overlap.
IRanges::subsetByOverlaps()
df <- data.frame(seqnames = c("chr1", rep("chr2", 2),
rep("chr3", 3), rep("chr4", 4)),
start = 1:10,
width = 10:1,
strand = c("-", "+", "+", "*", "*", "+", "+", "+", "-", "-"),
name = letters[1:10])
query <- as_granges(df)
df2 <- data.frame(seqnames = c(rep("chr2", 2), rep("chr1", 3), "chr2"),
start = c(4,3,7,13,1,4),
width = c(6,6,3,3,3,9),
strand = c(rep("+", 3), rep("-", 3)))
subject <- as_granges(df2)
filter_by_overlaps(query, subject)
#> GRanges object with 3 ranges and 1 metadata column:
#> seqnames ranges strand | name
#> <Rle> <IRanges> <Rle> | <character>
#> [1] chr1 1-10 - | a
#> [2] chr2 2-10 + | b
#> [3] chr2 3-10 + | c
#> -------
#> seqinfo: 4 sequences from an unspecified genome; no seqlengths
filter_by_overlaps_directed(query, subject)
#> GRanges object with 3 ranges and 1 metadata column:
#> seqnames ranges strand | name
#> <Rle> <IRanges> <Rle> | <character>
#> [1] chr1 1-10 - | a
#> [2] chr2 2-10 + | b
#> [3] chr2 3-10 + | c
#> -------
#> seqinfo: 4 sequences from an unspecified genome; no seqlengths
filter_by_non_overlaps(query, subject)
#> GRanges object with 7 ranges and 1 metadata column:
#> seqnames ranges strand | name
#> <Rle> <IRanges> <Rle> | <character>
#> [1] chr3 4-10 * | d
#> [2] chr3 5-10 * | e
#> [3] chr3 6-10 + | f
#> [4] chr4 7-10 + | g
#> [5] chr4 8-10 + | h
#> [6] chr4 9-10 - | i
#> [7] chr4 10 - | j
#> -------
#> seqinfo: 4 sequences from an unspecified genome; no seqlengths
filter_by_non_overlaps_directed(query, subject)
#> GRanges object with 7 ranges and 1 metadata column:
#> seqnames ranges strand | name
#> <Rle> <IRanges> <Rle> | <character>
#> [1] chr3 4-10 * | d
#> [2] chr3 5-10 * | e
#> [3] chr3 6-10 + | f
#> [4] chr4 7-10 + | g
#> [5] chr4 8-10 + | h
#> [6] chr4 9-10 - | i
#> [7] chr4 10 - | j
#> -------
#> seqinfo: 4 sequences from an unspecified genome; no seqlengths