1 Introduction

tripr is a Bioconductor package, written in shiny that provides analytics services on antigen receptor (B cell receptor immunoglobulin, BcR IG | T cell receptor, TR) gene sequence data. Every step of the analysis can be performed interactively, thus not requiring any programming skills. It takes as input the output files of the IMGT/HighV-Quest tool. Users can select to analyze the data from each of the input samples separately, or the combined data files from all samples and visualize the results accordingly. Functions for an R command-line use are also available.

1.1 Installation

tripr is distributed as a Bioconductor package and requires R (version “4.2”), which can be installed on any operating system from CRAN, and Bioconductor (version “3.15”).

To install tripr package enter the following commands in your R session:

if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

BiocManager::install("tripr")

## Check that you have a valid Bioconductor installation
BiocManager::valid()

1.2 Launching the app

Once tripr is successfully installed, it can be loaded as follow:

library(tripr)

2 Running tripr as a shiny application

In order to start the shiny app, please run the following command:

tripr::run_app()

tripr should be opening in a browser (ideally Chrome, Firefox or Opera). If this does not happen automatically, please open a browser and navigate to the address shown on the R console (for example, Listening on http://127.0.0.1:6134).

2.1 Home

In this tab users can import their data by selecting the directory where the data is stored, by pressing the Choose directory button. The tool takes as input the 10 output files of the IMGT/HighV-Quest tool in text format (.txt). Users can also choose only some of the files depending on the type of the downstream analysis.

Note that every sample of the dataset must have its own individual folder and every sample folder must be in one root folder (See example below). For the dataset to be selected for upload, this root folder must be selected and then the button Load Data has to be pressed.

Previous sessions can also be loaded with the Restore Previous Sessions button.

There are 2 options regarding the cell type (T cell and B cell) as well as 2 options based on the amount of available data (High- or Low-Throughput). Concerning the latter, the main difference is the application of the preselection and selection steps. In the case of High-Throughput data, all filters are applied consequentially (i.e. if a sequence fails >1 selection criteria, only the first unsatisfied criterion will be reported), whereas for Low-Throughput data all criteria are applied at the same time.

2.2 Preprocessing

tripr offers 2 steps of preprocessing:

  • Preselection: Refers to the cleaning process of the input dataset.

  • Selection: Refers to the filtering process of the resulting data from Preselection process.

2.2.1 Preselection

The Preselection process comprises 4 different criteria:

  • Only take into account Functional V-Gene:
    Only sequences utilizing a functional V gene are included into the downstream analysis. Sequences with pseudogenes (P) or open reading frame (ORF) genes are excluded from further analysis.
  • Only take into account CDR3 with no Special Characters (X,*,#,.):
    Only sequences without ambiguities (i.e. characters other than those of the 20 amino acids) are included in the analysis.
  • Only take into account Productive Sequences:
    Only productive sequences (without stop codons and frameshifts) are included in the analysis.
  • Only take into account CDR3 with valid start/end landmarks:
    Start/End CDR3 landmarks (anchors) can be customized by the user based on the type of data (BcR/TR, heavy/light chain). More than one valid landmark can be used. The different letters should be separated with a vertical bar (e.g. F|D). Sequences with landmarks other than the chosen ones are excluded from the analysis.