Introduction

Bacteria rely on rapid alteration of transcription termination to regulate their response to envrionmental perturbations. Premature termination of transcripts as well as attenuation of transcript length (ex: only expressing some genes in an operon) are both prevalent signals and studying them has potent implications for better understanding how antimicrobials affect bacteria. High throughput sequencing for studying termination signal (known as 3’-seq or term-seq) is currently available and more data sets are being created rapidly. However, unlike other high throughput sequencing methods, there is not standardized, easily accessible, statistically robust analysis tool. Here we present PIPETS (Poisson Identification of PEaks from Term-Seq data) which analyzes 3’-seq data using a Poisson Distribution Test to identify termination signal that is statistically higher than the surrounding noise. These termination peaks can be used to identify shifts in bacterial response and give a better understanding of a very important set of targets for ongoing antimicrobial development.

Quick Start

Installation

To install PIPETS, use Bioconductor’s BiocManager package.

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("PIPETS")

Basic PIPETS Run

PIPETS accepts two types of data as inputs: Bed Files or GRanges objects. Both sets of data undergo the same analysis method. Running the method differs slightly if the user is using a GRanges input. Here we show the basic walkthrough of running PIPETS on an input Bed file. The five parameters that need to be specified to run PIPETS are inputData, the readScoreMinimum of “good quality” reads,
the prefix for the output file name OutputFileID, the file path to the output director OutputFileDir ,and the type of input data inputDataFormat.

Bed File Input

library(PIPETS)


#Bed File Input
PIPETS_FullRun(inputData = "PIPETS_TestData.bed"
               ,readScoreMinimum = 42, OutputFileID = "ExampleResultsRun", 
               OutputFileDir = tempdir(),
               inputDataFormat = "bedFile")

GRanges Object Input

library(PIPETS)
library(BiocGenerics)
library(GenomicRanges)
#GRanges object input
#When run on GRanges objects, PIPETS also outputs the strand split GRanges objects to a list for the user
#The first item in the list are the + strand reads, and the second are the - strand reads
GRanges_Object <- read.delim(file = "PIPETS_TestData.bed",
        header = FALSE, stringsAsFactors = FALSE)

GRanges_Object <- GRanges(seqnames = GRanges_Object$V1,
        ranges = IRanges(start = GRanges_Object$V2,end = GRanges_Object$V3
        ),score = GRanges_Object$V5 ,strand = GRanges_Object$V6)

ResultsList <- PIPETS_FullRun(inputData = GRanges_Object, readScoreMinimum = 42, 
               OutputFileDir = tempdir(),
               OutputFileID = "ExampleResultsRun", inputDataFormat = "GRanges")

head(ResultsList)
## [[1]]
## GRanges object with 2232 ranges and 1 metadata column:
##           seqnames      ranges strand |     score
##              <Rle>   <IRanges>  <Rle> | <integer>
##      [1] NC_003028 29924-29982      + |        42
##      [2] NC_003028 29933-29991      + |        42
##      [3] NC_003028 29933-29991      + |        42
##      [4] NC_003028 30321-30379      + |        42
##      [5] NC_003028 30321-30379      + |        42
##      ...       ...         ...    ... .       ...
##   [2228] NC_003028 34660-34718      + |        42
##   [2229] NC_003028 34663-34721      + |        42
##   [2230] NC_003028 34663-34721      + |        42
##   [2231] NC_003028 34664-34722      + |        42
##   [2232] NC_003028 34665-34723      + |        42
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengths
## 
## [[2]]
## GRanges object with 2864 ranges and 1 metadata column:
##           seqnames      ranges strand |     score
##              <Rle>   <IRanges>  <Rle> | <integer>
##      [1] NC_003028     641-699      - |        42
##      [2] NC_003028     642-700      - |        42
##      [3] NC_003028     643-701      - |        42
##      [4] NC_003028     648-706      - |        42
##      [5] NC_003028     648-706      - |        42
##      ...       ...         ...    ... .       ...
##   [2860] NC_003028 34665-34723      - |        42
##   [2861] NC_003028 34665-34723      - |        42
##   [2862] NC_003028 34665-34723      - |        42
##   [2863] NC_003028 34665-34723      - |        42
##   [2864] NC_003028 34665-34723      - |        42
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengths

PIPETS Output

Per input Bed file, PIPETS will create 4 output files in the R directory. All output files will begin with the user defined file name.

First, PIPETS creates strand-split Bed files which it uses during analysis. PIPETS takes the input Bed file, separates the Plus-Strand (+) and the Complement-Strand (-) reads and counts how many reads are assigned to each genomic coordinate.