Read a BAM file

read_bam(file, index = file, paired = FALSE)

Arguments

file

A connection or path to a BAM file

index

The path to the BAM index file

paired

Whether to treat alignments as paired end (TRUE) or single end (FALSE). Default is FALSE.

Value

A DeferredGenomicRanges object

Details

Reading a BAM file is deferred until an action such as using summarise() or mutate() occurs. If paired is set to TRUE, when alignments are loaded, the GRanges has two additional columns called read_pair_id and read_pair_group corresponding to paired reads and is grouped by the read_pair_group.

Certain verbs have different behaviour, after using read_bam().

For select() valid columns are the fields available in the BAM file. Valid entries are qname (QNAME), flag (FLAG), rname (RNAME), strand, pos (POS), qwidth (width of query), mapq (MAPQ), cigar (CIGAR), mrnm (RNEXT), mpos (PNEXT), isize (TLEN), seq (SEQ), and qual (QUAL). Any two character tags in the BAM file are also valid.

For filter() the following fields are valid, to select the FALSE option place ! in front of the field:

  • is_paired Select either unpaired (FALSE) or paired (TRUE) reads.

  • is_proper_pair Select either improperly paired (FALSE) or properly paired (TRUE) reads. This is dependent on the alignment software used.

  • `is_unmapped_query`` Select unmapped (TRUE) or mapped (FALSE) reads.

  • has_unmapped_mate Select reads with mapped (FALSE) or unmapped (TRUE) mates.

  • is_minus_strand Select reads aligned to plus (FALSE) or minus (TRUE) strand.

  • is_mate_minus_strand Select reads where mate is aligned to plus (FALSE) or minus (TRUE) strand.

  • is_first_mate_read Select reads if they are the first mate (TRUE) or not (FALSE).

  • is_second_mate_read Select reads if they are the second mate (TRUE) or not (FALSE).

  • is_secondary_alignment Select reads if their alignment status is secondary (TRUE) or not (FALSE). This might be relevant if there are multimapping reads.

  • is_not_passing_quality_controls Select reads that either pass quality controls (FALSE) or that do not (TRUE).

  • is_duplicate Select reads that are unduplicated (FALSE) or duplicated (TRUE). This may represent reads that are PCR or optical duplicates.

Examples

if (require(pasillaBamSubset)) {
   bamfile <- untreated1_chr4()
   # nothing is read until an action has been performed
   print(read_bam(bamfile))
   # define a region of interest
   roi <- data.frame(seqnames = "chr4", start = 5e5, end = 7e5) %>%
            as_granges()
   rng <- read_bam(bamfile) %>% 
            select(mapq) %>%
            filter_by_overlaps(roi)
}
#> Loading required package: pasillaBamSubset
#> DeferredGenomicRanges object with 0 ranges and 0 metadata columns:
#>    seqnames    ranges strand
#>       <Rle> <IRanges>  <Rle>
#>   -------
#>   seqinfo: no sequences