1 Introduction

scRepertoire is designed to take filter contig outputs from the 10x Genomics Cell Ranger pipeline, process that data to assign clonotype based on two TCR or Ig chains and analyze the clonotype dynamics. The latter can be separated into 1) clonotype-only analysis functions, such as unique clonotypes or clonal space quantification, and 2) interaction with mRNA expression data using Seurat or SingleCellExperiment packages.

1.1 Loading Libraries

suppressMessages(library(scRepertoire))

1.2 Loading Data

1.2.1 What data to load into scRepertoire?

scRepertoire functions using the filtered_contig_annotations.csv output from the 10x Genomics Cell Ranger. This file is located in the ./outs/ directory of the VDJ alignment folder. To generate a list of contigs to use for scRepertoire:

  • load the filtered_contig_annotations.csv for each of the samples.
  • make a list in the R environment.
S1 <- read.csv(".../Sample1/outs/filtered_contig_annotations.csv")
S2 <- read.csv(".../Sample2/outs/filtered_contig_annotations.csv")
S3 <- read.csv(".../Sample3/outs/filtered_contig_annotations.csv")
S4 <- read.csv(".../Sample4/outs/filtered_contig_annotations.csv")

contig_list <- list(S1, S2, S3, S4)

1.2.2 Other alignment workflows

Beyond the default 10x Genomic Cell Ranger pipeline outputs, scRepertoire supports the following single-cell formats:

loadContigs() can be given a directory where the sequencing experiments are located and it will recursively load and process the contig data based on the file names. Alternatively, loadContigs() can be given a list of data frames and process the contig data

#Directory example
contig.output <- c("~/Documents/MyExperiment")
contig.list <- loadContigs(input = contig.output, 
                           format = "TRUST4")

#List of data frames example
S1 <- read.csv("~/Documents/MyExperiment/Sample1/outs/barcode_results.csv")
S2 <- read.csv("~/Documents/MyExperiment/Sample2/outs/barcode_results.csv")
S3 <- read.csv("~/Documents/MyExperiment/Sample3/outs/barcode_results.csv")
S4 <- read.csv("~/Documents/MyExperiment/Sample4/outs/barcode_results.csv")

contig_list <- list(S1, S2, S3, S4)
contig.list <- loadContigs(input = contig.output, 
                           format = "WAT3R")

1.2.3 Multiplexed Experiment

It is now easy to create the contig list from a multiplexed experiment by first generating a single-cell RNA object (either Seurat or Single Cell Experiment), loading the filtered contig file and then using createHTOContigList(). This function will return a list separated by the group.by variable(s).

This function depends on the match of barcodes between the single-cell object and contigs. If there is a prefix or different suffix added to the barcode, this will result in no contigs recovered. Currently, it is recommended you do this step before the integration, as integration workflows commonly alter the barcodes. There is a multi.run variable that can be used on the integrated object. However, it assumes you have modified the barcodes with the Seurat pipeline (automatic addition of _# to end), and your contig list is in the same order.

contigs <- read.csv(".../outs/filtered_contig_annotations.csv")

contig.list <- createHTOContigList(contigs, 
                                   Seurat.Obj, 
                                   group.by = "HTO_maxID")

1.3 Example Data in scRepertoire

scRepertoire comes with a data set from T cells derived from four patients with acute respiratory distress to demonstrate the functionality of the R package. More information on the data set can be found in the corresponding manuscript. The samples consist of paired peripheral-blood (B) and bronchoalveolar lavage (L), effectively creating 8 distinct runs for T cell receptor (TCR) enrichment. We can preview the elements in the list by using the head function and looking at the first contig annotation.

The built-in example data is derived from the 10x Cell Ranger pipeline, so it is ready to go for downstream processing and analysis.

data("contig_list") #the data built into scRepertoire

head(contig_list[[1]])
##              barcode is_cell                   contig_id high_confidence length
## 1 AAACCTGAGTACGACG-1    True AAACCTGAGTACGACG-1_contig_1            True    500
## 2 AAACCTGAGTACGACG-1    True AAACCTGAGTACGACG-1_contig_2            True    478
## 4 AAACCTGCAACACGCC-1    True AAACCTGCAACACGCC-1_contig_1            True    506
## 5 AAACCTGCAACACGCC-1    True AAACCTGCAACACGCC-1_contig_2            True    470
## 6 AAACCTGCAGGCGATA-1    True AAACCTGCAGGCGATA-1_contig_1            True    558
## 7 AAACCTGCAGGCGATA-1    True AAACCTGCAGGCGATA-1_contig_2            True    505
##   chain       v_gene d_gene  j_gene c_gene full_length productive
## 1   TRA       TRAV25   None  TRAJ20   TRAC        True       True
## 2   TRB      TRBV5-1   None TRBJ2-7  TRBC2        True       True
## 4   TRA TRAV38-2/DV8   None  TRAJ52   TRAC        True       True
## 5   TRB     TRBV10-3   None TRBJ2-2  TRBC2        True       True
## 6   TRA     TRAV12-1   None   TRAJ9   TRAC        True       True
## 7   TRB        TRBV9   None TRBJ2-2  TRBC2        True       True
##                 cdr3                                                cdr3_nt
## 1        CGCSNDYKLSF                      TGTGGGTGTTCTAACGACTACAAGCTCAGCTTT
## 2     CASSLTDRTYEQYF             TGCGCCAGCAGCTTGACCGACAGGACCTACGAGCAGTACTTC
## 4 CAYRSAQAGGTSYGKLTF TGTGCTTATAGGAGCGCGCAGGCTGGTGGTACTAGCTATGGAAAGCTGACATTT
## 5      CAISEQGKGELFF                TGTGCCATCAGTGAACAGGGGAAAGGGGAGCTGTTTTTT
## 6     CVVSDNTGGFKTIF             TGTGTGGTCTCCGATAATACTGGAGGCTTCAAAACTATCTTT
## 7  CASSVRRERANTGELFF    TGTGCCAGCAGCGTAAGGAGGGAAAGGGCGAACACCGGGGAGCTGTTTTTT
##   reads umis raw_clonotype_id         raw_consensus_id
## 1  8344    4     clonotype123 clonotype123_consensus_2
## 2 65390   38     clonotype123 clonotype123_consensus_1
## 4 18372    8     clonotype124 clonotype124_consensus_1
## 5 34054    9     clonotype124 clonotype124_consensus_2
## 6  5018    2       clonotype1   clonotype1_consensus_2
## 7 25110   11       clonotype1   clonotype1_consensus_1

2 Combining Contigs into Clones

2.1 combineTCR

input.data

  • List of filtered_contig_annotations.csv data frames from the 10x Cell Ranger.
  • List of data processed using loadContigs().

samples and ID

  • Grouping variables for downstream analysis and will be added as prefixes to prevent issues with duplicate barcodes (optional).

removeNA

  • TRUE - Filter to remove any cell barcode with an NA value in at least one of the chains.
  • FALSE - Include and incorporate cells with 1 NA value (default).

removeMulti

  • TRUE - Filter to remove any cell barcode with more than 2 immune receptor chains.
  • FALSE - Include and incorporate cells with > 2 chains (default).

filterMulti

  • TRUE - Isolate the top 2 expressed chains in cell barcodes with multiple chains.
  • FALSE - Include and incorporate cells with > 2 chains (default).

The output of combineTCR() will be a list of contig data frames that will be reduced to the reads associated with a single cell barcode. It will also combine the multiple reads into clone calls by either the nucleotide sequence (CTnt), amino acid sequence (CTaa), the VDJC gene sequence (CTgene), or the combination of the nucleotide and gene sequence (CTstrict).

combined.TCR <- combineTCR(contig_list, 
                           samples = c("P17B", "P17L", "P18B", "P18L", 
                                            "P19B","P19L", "P20B", "P20L"),
                           removeNA = FALSE, 
                           removeMulti = FALSE, 
                           filterMulti = FALSE)

head(combined.TCR[[1]])
##                    barcode sample                     TCR1           cdr3_aa1
## 1  P17B_AAACCTGAGTACGACG-1   P17B       TRAV25.TRAJ20.TRAC        CGCSNDYKLSF
## 3  P17B_AAACCTGCAACACGCC-1   P17B TRAV38-2/DV8.TRAJ52.TRAC CAYRSAQAGGTSYGKLTF
## 5  P17B_AAACCTGCAGGCGATA-1   P17B      TRAV12-1.TRAJ9.TRAC     CVVSDNTGGFKTIF
## 7  P17B_AAACCTGCATGAGCGA-1   P17B      TRAV12-1.TRAJ9.TRAC     CVVSDNTGGFKTIF
## 9  P17B_AAACGGGAGAGCCCAA-1   P17B        TRAV20.TRAJ8.TRAC      CAVRGEGFQKLVF
## 10 P17B_AAACGGGAGCGTTTAC-1   P17B      TRAV12-1.TRAJ9.TRAC     CVVSDNTGGFKTIF
##                                                  cdr3_nt1
## 1                       TGTGGGTGTTCTAACGACTACAAGCTCAGCTTT
## 3  TGTGCTTATAGGAGCGCGCAGGCTGGTGGTACTAGCTATGGAAAGCTGACATTT
## 5              TGTGTGGTCTCCGATAATACTGGAGGCTTCAAAACTATCTTT
## 7              TGTGTGGTCTCCGATAATACTGGAGGCTTCAAAACTATCTTT
## 9                 TGTGCTGTGCGAGGAGAAGGCTTTCAGAAACTTGTATTT
## 10             TGTGTGGTCTCCGATAATACTGGAGGCTTCAAAACTATCTTT
##                           TCR2          cdr3_aa2
## 1   TRBV5-1.None.TRBJ2-7.TRBC2    CASSLTDRTYEQYF
## 3  TRBV10-3.None.TRBJ2-2.TRBC2     CAISEQGKGELFF
## 5     TRBV9.None.TRBJ2-2.TRBC2 CASSVRRERANTGELFF
## 7     TRBV9.None.TRBJ2-2.TRBC2 CASSVRRERANTGELFF
## 9                         <NA>              <NA>
## 10    TRBV9.None.TRBJ2-2.TRBC2 CASSVRRERANTGELFF
##                                               cdr3_nt2
## 1           TGCGCCAGCAGCTTGACCGACAGGACCTACGAGCAGTACTTC
## 3              TGTGCCATCAGTGAACAGGGGAAAGGGGAGCTGTTTTTT
## 5  TGTGCCAGCAGCGTAAGGAGGGAAAGGGCGAACACCGGGGAGCTGTTTTTT
## 7  TGTGCCAGCAGCGTAAGGAGGGAAAGGGCGAACACCGGGGAGCTGTTTTTT
## 9                                                 <NA>
## 10 TGTGCCAGCAGCGTAAGGAGGGAAAGGGCGAACACCGGGGAGCTGTTTTTT
##                                                  CTgene
## 1         TRAV25.TRAJ20.TRAC_TRBV5-1.None.TRBJ2-7.TRBC2
## 3  TRAV38-2/DV8.TRAJ52.TRAC_TRBV10-3.None.TRBJ2-2.TRBC2
## 5          TRAV12-1.TRAJ9.TRAC_TRBV9.None.TRBJ2-2.TRBC2
## 7          TRAV12-1.TRAJ9.TRAC_TRBV9.None.TRBJ2-2.TRBC2
## 9                                  TRAV20.TRAJ8.TRAC_NA
## 10         TRAV12-1.TRAJ9.TRAC_TRBV9.None.TRBJ2-2.TRBC2
##                                                                                              CTnt
## 1                    TGTGGGTGTTCTAACGACTACAAGCTCAGCTTT_TGCGCCAGCAGCTTGACCGACAGGACCTACGAGCAGTACTTC
## 3  TGTGCTTATAGGAGCGCGCAGGCTGGTGGTACTAGCTATGGAAAGCTGACATTT_TGTGCCATCAGTGAACAGGGGAAAGGGGAGCTGTTTTTT
## 5  TGTGTGGTCTCCGATAATACTGGAGGCTTCAAAACTATCTTT_TGTGCCAGCAGCGTAAGGAGGGAAAGGGCGAACACCGGGGAGCTGTTTTTT
## 7  TGTGTGGTCTCCGATAATACTGGAGGCTTCAAAACTATCTTT_TGTGCCAGCAGCGTAAGGAGGGAAAGGGCGAACACCGGGGAGCTGTTTTTT
## 9                                                      TGTGCTGTGCGAGGAGAAGGCTTTCAGAAACTTGTATTT_NA
## 10 TGTGTGGTCTCCGATAATACTGGAGGCTTCAAAACTATCTTT_TGTGCCAGCAGCGTAAGGAGGGAAAGGGCGAACACCGGGGAGCTGTTTTTT
##                                CTaa
## 1        CGCSNDYKLSF_CASSLTDRTYEQYF
## 3  CAYRSAQAGGTSYGKLTF_CAISEQGKGELFF
## 5  CVVSDNTGGFKTIF_CASSVRRERANTGELFF
## 7  CVVSDNTGGFKTIF_CASSVRRERANTGELFF
## 9                  CAVRGEGFQKLVF_NA
## 10 CVVSDNTGGFKTIF_CASSVRRERANTGELFF
##                                                                                                                                               CTstrict
## 1                           TRAV25.TRAJ20.TRAC;TGTGGGTGTTCTAACGACTACAAGCTCAGCTTT_TRBV5-1.None.TRBJ2-7.TRBC2;TGCGCCAGCAGCTTGACCGACAGGACCTACGAGCAGTACTTC
## 3  TRAV38-2/DV8.TRAJ52.TRAC;TGTGCTTATAGGAGCGCGCAGGCTGGTGGTACTAGCTATGGAAAGCTGACATTT_TRBV10-3.None.TRBJ2-2.TRBC2;TGTGCCATCAGTGAACAGGGGAAAGGGGAGCTGTTTTTT
## 5          TRAV12-1.TRAJ9.TRAC;TGTGTGGTCTCCGATAATACTGGAGGCTTCAAAACTATCTTT_TRBV9.None.TRBJ2-2.TRBC2;TGTGCCAGCAGCGTAAGGAGGGAAAGGGCGAACACCGGGGAGCTGTTTTTT
## 7          TRAV12-1.TRAJ9.TRAC;TGTGTGGTCTCCGATAATACTGGAGGCTTCAAAACTATCTTT_TRBV9.None.TRBJ2-2.TRBC2;TGTGCCAGCAGCGTAAGGAGGGAAAGGGCGAACACCGGGGAGCTGTTTTTT
## 9                                                                                      TRAV20.TRAJ8.TRAC;TGTGCTGTGCGAGGAGAAGGCTTTCAGAAACTTGTATTT_NA;NA
## 10         TRAV12-1.TRAJ9.TRAC;TGTGTGGTCTCCGATAATACTGGAGGCTTCAAAACTATCTTT_TRBV9.None.TRBJ2-2.TRBC2;TGTGCCAGCAGCGTAAGGAGGGAAAGGGCGAACACCGGGGAGCTGTTTTTT

2.2 combineBCR

combineBCR() is analogous to combineTCR() with 2 major changes: 1) Each barcode can only have a maximum of 2 sequences, if greater exists, the 2 with the highest reads are selected; 2) The strict definition of a clone is based on the normalized Levenshtein edit distance of CDR3 nucleotide sequences and V-gene usage. For more information on this approach, please see the respective citation. This definition allows for the grouping of BCRs derived from the same progenitor that have undergone mutation as part of somatic hypermutation and affinity maturation.

threshold
The level of similarity in sequences to group together. Default is 0.85.

\[ \text{threshold}(s, t) = 1-\frac{\text{Levenshtein}(s, t)}{\frac{\text{length}(s) + \text{length}(t)}{2}} \]

call.related.clones
Calculate the normalized edit distance (TRUE) or skip the calculation (FALSE). Skipping the edit distance calculation may save time, especially in the context of large data sets, but is not recommended.

BCR.contigs <- read.csv("https://www.borch.dev/uploads/contigs/b_contigs.csv")
combined.BCR <- combineBCR(BCR.contigs, 
                           samples = "P1", 
                           threshold = 0.85)

head(combined.BCR[[1]])
##                 barcode sample                          IGH
## 1 P1_CGGGTCAAGTACGACG-1     P1        None.None.IGHJ1.IGHA2
## 2 P1_TCAATCTTCGATAGAA-1     P1                         <NA>
## 3 P1_AAACGGGAGCTTATCG-1     P1      IGHV3-23.None.None.None
## 4 P1_GTTCATTTCTTTAGGG-1     P1                         <NA>
## 5 P1_AGTGAGGGTAAATACG-1     P1 IGHV1-2.IGHD2-21.IGHJ4.IGHG1
## 6 P1_TCTGAGATCCCTCTTT-1     P1      IGHV3-15.None.None.None
##                 cdr3_aa1
## 1                   None
## 2                   <NA>
## 3                   None
## 4                   <NA>
## 5 CATTSPHVVVVPVADPPPFGHW
## 6                   None
##                                                             cdr3_nt1
## 1                                                               None
## 2                                                               <NA>
## 3                                                               None
## 4                                                               <NA>
## 5 TGTGCGACTACGTCTCCACATGTTGTTGTGGTGCCAGTTGCCGATCCCCCCCCCTTTGGCCACTGG
## 6                                                               None
##                   IGLC cdr3_aa2 cdr3_nt2
## 1   IGLV1-44.None.None     None     None
## 2 IGLV2-11.IGLJ1.IGLC2     None     None
## 3   IGLV1-47.None.None     None     None
## 4 IGLV2-11.IGLJ1.IGLC1     None     None
## 5   IGLV1-47.None.None     None     None
## 6  IGKV2D-28.None.None     None     None
##                                            CTgene
## 1        None.None.IGHJ1.IGHA2_IGLV1-44.None.None
## 2                         NA_IGLV2-11.IGLJ1.IGLC2
## 3      IGHV3-23.None.None.None_IGLV1-47.None.None
## 4                         NA_IGLV2-11.IGLJ1.IGLC1
## 5 IGHV1-2.IGHD2-21.IGHJ4.IGHG1_IGLV1-47.None.None
## 6     IGHV3-15.None.None.None_IGKV2D-28.None.None
##                                                                      CTnt
## 1                                                               None_None
## 2                                                                 NA_None
## 3                                                               None_None
## 4                                                                 NA_None
## 5 TGTGCGACTACGTCTCCACATGTTGTTGTGGTGCCAGTTGCCGATCCCCCCCCCTTTGGCCACTGG_None
## 6                                                               None_None
##                          CTaa                          CTstrict
## 1                   None_None               NA.None_NA.IGLV1-44
## 2                     NA_None                 NA.NA_NA.IGLV2-11
## 3                   None_None           NA.IGHV3-23_NA.IGLV1-47
## 4                     NA_None                 NA.NA_NA.IGLV2-11
## 5 CATTSPHVVVVPVADPPPFGHW_None IGH:Cluster.6.IGHV1-2_NA.IGLV1-47
## 6                   None_None          NA.IGHV3-15_NA.IGKV2D-28

3 Additional Processing

3.1 addVariable

What if there are more variables to add than just sample and ID? We can add them by using the addVariable() function. All we need is the variable.name of the variable you’d like to add and the specific character or numeric values (variables). As an example, here we add the Type in which the samples were processed and sequenced.

combined.TCR <- addVariable(combined.TCR, 
                            variable.name = "Type", 
                            variables = rep(c("B", "L"), 4))

head(combined.TCR[[1]])
##                    barcode sample                     TCR1           cdr3_aa1
## 1  P17B_AAACCTGAGTACGACG-1   P17B       TRAV25.TRAJ20.TRAC        CGCSNDYKLSF
## 3  P17B_AAACCTGCAACACGCC-1   P17B TRAV38-2/DV8.TRAJ52.TRAC CAYRSAQAGGTSYGKLTF
## 5  P17B_AAACCTGCAGGCGATA-1   P17B      TRAV12-1.TRAJ9.TRAC     CVVSDNTGGFKTIF
## 7  P17B_AAACCTGCATGAGCGA-1   P17B      TRAV12-1.TRAJ9.TRAC     CVVSDNTGGFKTIF
## 9  P17B_AAACGGGAGAGCCCAA-1   P17B        TRAV20.TRAJ8.TRAC      CAVRGEGFQKLVF
## 10 P17B_AAACGGGAGCGTTTAC-1   P17B      TRAV12-1.TRAJ9.TRAC     CVVSDNTGGFKTIF
##                                                  cdr3_nt1
## 1                       TGTGGGTGTTCTAACGACTACAAGCTCAGCTTT
## 3  TGTGCTTATAGGAGCGCGCAGGCTGGTGGTACTAGCTATGGAAAGCTGACATTT
## 5              TGTGTGGTCTCCGATAATACTGGAGGCTTCAAAACTATCTTT
## 7              TGTGTGGTCTCCGATAATACTGGAGGCTTCAAAACTATCTTT
## 9                 TGTGCTGTGCGAGGAGAAGGCTTTCAGAAACTTGTATTT
## 10             TGTGTGGTCTCCGATAATACTGGAGGCTTCAAAACTATCTTT
##                           TCR2          cdr3_aa2
## 1   TRBV5-1.None.TRBJ2-7.TRBC2    CASSLTDRTYEQYF
## 3  TRBV10-3.None.TRBJ2-2.TRBC2     CAISEQGKGELFF
## 5     TRBV9.None.TRBJ2-2.TRBC2 CASSVRRERANTGELFF
## 7     TRBV9.None.TRBJ2-2.TRBC2 CASSVRRERANTGELFF
## 9                         <NA>              <NA>
## 10    TRBV9.None.TRBJ2-2.TRBC2 CASSVRRERANTGELFF
##                                               cdr3_nt2
## 1           TGCGCCAGCAGCTTGACCGACAGGACCTACGAGCAGTACTTC
## 3              TGTGCCATCAGTGAACAGGGGAAAGGGGAGCTGTTTTTT
## 5  TGTGCCAGCAGCGTAAGGAGGGAAAGGGCGAACACCGGGGAGCTGTTTTTT
## 7  TGTGCCAGCAGCGTAAGGAGGGAAAGGGCGAACACCGGGGAGCTGTTTTTT
## 9                                                 <NA>
## 10 TGTGCCAGCAGCGTAAGGAGGGAAAGGGCGAACACCGGGGAGCTGTTTTTT
##                                                  CTgene
## 1         TRAV25.TRAJ20.TRAC_TRBV5-1.None.TRBJ2-7.TRBC2
## 3  TRAV38-2/DV8.TRAJ52.TRAC_TRBV10-3.None.TRBJ2-2.TRBC2
## 5          TRAV12-1.TRAJ9.TRAC_TRBV9.None.TRBJ2-2.TRBC2
## 7          TRAV12-1.TRAJ9.TRAC_TRBV9.None.TRBJ2-2.TRBC2
## 9                                  TRAV20.TRAJ8.TRAC_NA
## 10         TRAV12-1.TRAJ9.TRAC_TRBV9.None.TRBJ2-2.TRBC2
##                                                                                              CTnt
## 1                    TGTGGGTGTTCTAACGACTACAAGCTCAGCTTT_TGCGCCAGCAGCTTGACCGACAGGACCTACGAGCAGTACTTC
## 3  TGTGCTTATAGGAGCGCGCAGGCTGGTGGTACTAGCTATGGAAAGCTGACATTT_TGTGCCATCAGTGAACAGGGGAAAGGGGAGCTGTTTTTT
## 5  TGTGTGGTCTCCGATAATACTGGAGGCTTCAAAACTATCTTT_TGTGCCAGCAGCGTAAGGAGGGAAAGGGCGAACACCGGGGAGCTGTTTTTT
## 7  TGTGTGGTCTCCGATAATACTGGAGGCTTCAAAACTATCTTT_TGTGCCAGCAGCGTAAGGAGGGAAAGGGCGAACACCGGGGAGCTGTTTTTT
## 9                                                      TGTGCTGTGCGAGGAGAAGGCTTTCAGAAACTTGTATTT_NA
## 10 TGTGTGGTCTCCGATAATACTGGAGGCTTCAAAACTATCTTT_TGTGCCAGCAGCGTAAGGAGGGAAAGGGCGAACACCGGGGAGCTGTTTTTT
##                                CTaa
## 1        CGCSNDYKLSF_CASSLTDRTYEQYF
## 3  CAYRSAQAGGTSYGKLTF_CAISEQGKGELFF
## 5  CVVSDNTGGFKTIF_CASSVRRERANTGELFF
## 7  CVVSDNTGGFKTIF_CASSVRRERANTGELFF
## 9                  CAVRGEGFQKLVF_NA
## 10 CVVSDNTGGFKTIF_CASSVRRERANTGELFF
##                                                                                                                                               CTstrict
## 1                           TRAV25.TRAJ20.TRAC;TGTGGGTGTTCTAACGACTACAAGCTCAGCTTT_TRBV5-1.None.TRBJ2-7.TRBC2;TGCGCCAGCAGCTTGACCGACAGGACCTACGAGCAGTACTTC
## 3  TRAV38-2/DV8.TRAJ52.TRAC;TGTGCTTATAGGAGCGCGCAGGCTGGTGGTACTAGCTATGGAAAGCTGACATTT_TRBV10-3.None.TRBJ2-2.TRBC2;TGTGCCATCAGTGAACAGGGGAAAGGGGAGCTGTTTTTT
## 5          TRAV12-1.TRAJ9.TRAC;TGTGTGGTCTCCGATAATACTGGAGGCTTCAAAACTATCTTT_TRBV9.None.TRBJ2-2.TRBC2;TGTGCCAGCAGCGTAAGGAGGGAAAGGGCGAACACCGGGGAGCTGTTTTTT
## 7          TRAV12-1.TRAJ9.TRAC;TGTGTGGTCTCCGATAATACTGGAGGCTTCAAAACTATCTTT_TRBV9.None.TRBJ2-2.TRBC2;TGTGCCAGCAGCGTAAGGAGGGAAAGGGCGAACACCGGGGAGCTGTTTTTT
## 9                                                                                      TRAV20.TRAJ8.TRAC;TGTGCTGTGCGAGGAGAAGGCTTTCAGAAACTTGTATTT_NA;NA
## 10         TRAV12-1.TRAJ9.TRAC;TGTGTGGTCTCCGATAATACTGGAGGCTTCAAAACTATCTTT_TRBV9.None.TRBJ2-2.TRBC2;TGTGCCAGCAGCGTAAGGAGGGAAAGGGCGAACACCGGGGAGCTGTTTTTT
##    Type
## 1     B
## 3     B
## 5     B
## 7     B
## 9     B
## 10    B

3.2 subsetClones

Likewise, we can remove specific list elements after combineTCR() using the subsetClones() function. In order to subset, we need to identify the vector we would like to use for subsetting (name) and the variable values to subset (variables). Below, we isolate just the 2 sequencing results from P18L and P18B.

subset1 <- subsetClones(combined.TCR, 
                        name = "sample", 
                        variables = c("P18L", "P18B"))

head(subset1[[1]])
##                    barcode sample                 TCR1         cdr3_aa1
## 1  P18B_AAACCTGAGGCTCAGA-1   P18B TRAV26-1.TRAJ37.TRAC  CIVRGGSSNTGKLIF
## 3  P18B_AAACCTGCATGACATC-1   P18B    TRAV3.TRAJ20.TRAC    CAVQRSNDYKLSF
## 5  P18B_AAACCTGGTATGCTTG-1   P18B TRAV26-1.TRAJ53.TRAC   CIGSSGGSNYKLTF
## 8  P18B_AAACGGGCAGATGGGT-1   P18B                 <NA>             <NA>
## 9  P18B_AAACGGGTCTTACCGC-1   P18B    TRAV20.TRAJ9.TRAC CAVQAKRYTGGFKTIF
## 12 P18B_AAAGATGAGTTACGGG-1   P18B   TRAV8-3.TRAJ8.TRAC   CAVGGDTGFQKLVF
##                                            cdr3_nt1
## 1     TGCATCGTCAGGGGCGGCTCTAGCAACACAGGCAAACTAATCTTT
## 3           TGTGCTGTGCAACGTTCTAACGACTACAAGCTCAGCTTT
## 5        TGCATCGGCTCAAGTGGAGGTAGCAACTATAAACTGACATTT
## 8                                              <NA>
## 9  TGTGCTGTGCAGGCCAAGCGGTATACTGGAGGCTTCAAAACTATCTTT
## 12       TGTGCTGTGGGTGGTGACACAGGCTTTCAGAAACTTGTATTT
##                                                     TCR2
## 1                             TRBV6-1.None.TRBJ2-3.TRBC2
## 3                             TRBV3-1.None.TRBJ2-3.TRBC2
## 5   TRBV4-1.None.TRBJ2-2.TRBC2;TRBV19.None.TRBJ1-5.TRBC1
## 8                             TRBV5-1.None.TRBJ1-2.TRBC1
## 9  TRBV5-1.None.TRBJ1-1.TRBC1;TRBV7-9.None.TRBJ2-2.TRBC2
## 12                           TRBV12-4.None.TRBJ1-1.TRBC1
##                          cdr3_aa2
## 1                 CASIGRSFGRDTQYF
## 3                CASSPPRGGFTDTQYF
## 5  CASSQGGQGGRELFF;CASSYAVGRQPQHF
## 8                  CASSLRETNYGYTF
## 9  CASSLGTGTGVEAFF;CAIDPGLLTGELFF
## 12                  CASRNSQATEAFF
##                                                                                    cdr3_nt2
## 1                                             TGTGCCAGTATCGGGAGGTCCTTTGGCCGAGATACGCAGTATTTT
## 3                                          TGTGCCAGCAGCCCCCCCCGCGGCGGATTCACAGATACGCAGTATTTT
## 5  TGCGCCAGCAGCCAAGGTGGACAGGGCGGAAGGGAGCTGTTTTTT;TGTGCCAGTAGCTACGCGGTGGGGAGGCAGCCCCAGCATTTT
## 8                                                TGCGCCAGCAGCTTGAGGGAAACCAACTATGGCTACACCTTC
## 9  TGCGCCAGCAGCTTGGGAACGGGGACAGGGGTTGAAGCTTTCTTT;TGTGCCATCGATCCGGGACTACTCACCGGGGAGCTGTTTTTT
## 12                                                  TGTGCCAGCAGAAACTCCCAAGCCACTGAAGCTTTCTTT
##                                                                       CTgene
## 1                            TRAV26-1.TRAJ37.TRAC_TRBV6-1.None.TRBJ2-3.TRBC2
## 3                               TRAV3.TRAJ20.TRAC_TRBV3-1.None.TRBJ2-3.TRBC2
## 5  TRAV26-1.TRAJ53.TRAC_TRBV4-1.None.TRBJ2-2.TRBC2;TRBV19.None.TRBJ1-5.TRBC1
## 8                                              NA_TRBV5-1.None.TRBJ1-2.TRBC1
## 9    TRAV20.TRAJ9.TRAC_TRBV5-1.None.TRBJ1-1.TRBC1;TRBV7-9.None.TRBJ2-2.TRBC2
## 12                            TRAV8-3.TRAJ8.TRAC_TRBV12-4.None.TRBJ1-1.TRBC1
##                                                                                                                                         CTnt
## 1                                                TGCATCGTCAGGGGCGGCTCTAGCAACACAGGCAAACTAATCTTT_TGTGCCAGTATCGGGAGGTCCTTTGGCCGAGATACGCAGTATTTT
## 3                                                   TGTGCTGTGCAACGTTCTAACGACTACAAGCTCAGCTTT_TGTGCCAGCAGCCCCCCCCGCGGCGGATTCACAGATACGCAGTATTTT
## 5        TGCATCGGCTCAAGTGGAGGTAGCAACTATAAACTGACATTT_TGCGCCAGCAGCCAAGGTGGACAGGGCGGAAGGGAGCTGTTTTTT;TGTGCCAGTAGCTACGCGGTGGGGAGGCAGCCCCAGCATTTT
## 8                                                                                              NA_TGCGCCAGCAGCTTGAGGGAAACCAACTATGGCTACACCTTC
## 9  TGTGCTGTGCAGGCCAAGCGGTATACTGGAGGCTTCAAAACTATCTTT_TGCGCCAGCAGCTTGGGAACGGGGACAGGGGTTGAAGCTTTCTTT;TGTGCCATCGATCCGGGACTACTCACCGGGGAGCTGTTTTTT
## 12                                                        TGTGCTGTGGGTGGTGACACAGGCTTTCAGAAACTTGTATTT_TGTGCCAGCAGAAACTCCCAAGCCACTGAAGCTTTCTTT
##                                               CTaa
## 1                  CIVRGGSSNTGKLIF_CASIGRSFGRDTQYF
## 3                   CAVQRSNDYKLSF_CASSPPRGGFTDTQYF
## 5    CIGSSGGSNYKLTF_CASSQGGQGGRELFF;CASSYAVGRQPQHF
## 8                                NA_CASSLRETNYGYTF
## 9  CAVQAKRYTGGFKTIF_CASSLGTGTGVEAFF;CAIDPGLLTGELFF
## 12                    CAVGGDTGFQKLVF_CASRNSQATEAFF
##                                                                                                                                                                                                             CTstrict
## 1                                                                        TRAV26-1.TRAJ37.TRAC;TGCATCGTCAGGGGCGGCTCTAGCAACACAGGCAAACTAATCTTT_TRBV6-1.None.TRBJ2-3.TRBC2;TGTGCCAGTATCGGGAGGTCCTTTGGCCGAGATACGCAGTATTTT
## 3                                                                              TRAV3.TRAJ20.TRAC;TGTGCTGTGCAACGTTCTAACGACTACAAGCTCAGCTTT_TRBV3-1.None.TRBJ2-3.TRBC2;TGTGCCAGCAGCCCCCCCCGCGGCGGATTCACAGATACGCAGTATTTT
## 5      TRAV26-1.TRAJ53.TRAC;TGCATCGGCTCAAGTGGAGGTAGCAACTATAAACTGACATTT_TRBV4-1.None.TRBJ2-2.TRBC2;TRBV19.None.TRBJ1-5.TRBC1;TGCGCCAGCAGCCAAGGTGGACAGGGCGGAAGGGAGCTGTTTTTT;TGTGCCAGTAGCTACGCGGTGGGGAGGCAGCCCCAGCATTTT
## 8                                                                                                                                        NA;NA_TRBV5-1.None.TRBJ1-2.TRBC1;TGCGCCAGCAGCTTGAGGGAAACCAACTATGGCTACACCTTC
## 9  TRAV20.TRAJ9.TRAC;TGTGCTGTGCAGGCCAAGCGGTATACTGGAGGCTTCAAAACTATCTTT_TRBV5-1.None.TRBJ1-1.TRBC1;TRBV7-9.None.TRBJ2-2.TRBC2;TGCGCCAGCAGCTTGGGAACGGGGACAGGGGTTGAAGCTTTCTTT;TGTGCCATCGATCCGGGACTACTCACCGGGGAGCTGTTTTTT
## 12                                                                                 TRAV8-3.TRAJ8.TRAC;TGTGCTGTGGGTGGTGACACAGGCTTTCAGAAACTTGTATTT_TRBV12-4.None.TRBJ1-1.TRBC1;TGTGCCAGCAGAAACTCCCAAGCCACTGAAGCTTTCTTT
##    Type
## 1     B
## 3     B
## 5     B
## 8     B
## 9     B
## 12    B

Alternatively, we can also just select the list elements after combineTCR() or combineBCR().

subset2 <- combined.TCR[c(3,4)]
head(subset2[[1]])
##                    barcode sample                 TCR1         cdr3_aa1
## 1  P18B_AAACCTGAGGCTCAGA-1   P18B TRAV26-1.TRAJ37.TRAC  CIVRGGSSNTGKLIF
## 3  P18B_AAACCTGCATGACATC-1   P18B    TRAV3.TRAJ20.TRAC    CAVQRSNDYKLSF
## 5  P18B_AAACCTGGTATGCTTG-1   P18B TRAV26-1.TRAJ53.TRAC   CIGSSGGSNYKLTF
## 8  P18B_AAACGGGCAGATGGGT-1   P18B                 <NA>             <NA>
## 9  P18B_AAACGGGTCTTACCGC-1   P18B    TRAV20.TRAJ9.TRAC CAVQAKRYTGGFKTIF
## 12 P18B_AAAGATGAGTTACGGG-1   P18B   TRAV8-3.TRAJ8.TRAC   CAVGGDTGFQKLVF
##                                            cdr3_nt1
## 1     TGCATCGTCAGGGGCGGCTCTAGCAACACAGGCAAACTAATCTTT
## 3           TGTGCTGTGCAACGTTCTAACGACTACAAGCTCAGCTTT
## 5        TGCATCGGCTCAAGTGGAGGTAGCAACTATAAACTGACATTT
## 8                                              <NA>
## 9  TGTGCTGTGCAGGCCAAGCGGTATACTGGAGGCTTCAAAACTATCTTT
## 12       TGTGCTGTGGGTGGTGACACAGGCTTTCAGAAACTTGTATTT
##                                                     TCR2
## 1                             TRBV6-1.None.TRBJ2-3.TRBC2
## 3                             TRBV3-1.None.TRBJ2-3.TRBC2
## 5   TRBV4-1.None.TRBJ2-2.TRBC2;TRBV19.None.TRBJ1-5.TRBC1
## 8                             TRBV5-1.None.TRBJ1-2.TRBC1
## 9  TRBV5-1.None.TRBJ1-1.TRBC1;TRBV7-9.None.TRBJ2-2.TRBC2
## 12                           TRBV12-4.None.TRBJ1-1.TRBC1
##                          cdr3_aa2
## 1                 CASIGRSFGRDTQYF
## 3                CASSPPRGGFTDTQYF
## 5  CASSQGGQGGRELFF;CASSYAVGRQPQHF
## 8                  CASSLRETNYGYTF
## 9  CASSLGTGTGVEAFF;CAIDPGLLTGELFF
## 12                  CASRNSQATEAFF
##                                                                                    cdr3_nt2
## 1                                             TGTGCCAGTATCGGGAGGTCCTTTGGCCGAGATACGCAGTATTTT
## 3                                          TGTGCCAGCAGCCCCCCCCGCGGCGGATTCACAGATACGCAGTATTTT
## 5  TGCGCCAGCAGCCAAGGTGGACAGGGCGGAAGGGAGCTGTTTTTT;TGTGCCAGTAGCTACGCGGTGGGGAGGCAGCCCCAGCATTTT
## 8                                                TGCGCCAGCAGCTTGAGGGAAACCAACTATGGCTACACCTTC
## 9  TGCGCCAGCAGCTTGGGAACGGGGACAGGGGTTGAAGCTTTCTTT;TGTGCCATCGATCCGGGACTACTCACCGGGGAGCTGTTTTTT
## 12                                                  TGTGCCAGCAGAAACTCCCAAGCCACTGAAGCTTTCTTT
##                                                                       CTgene
## 1                            TRAV26-1.TRAJ37.TRAC_TRBV6-1.None.TRBJ2-3.TRBC2
## 3                               TRAV3.TRAJ20.TRAC_TRBV3-1.None.TRBJ2-3.TRBC2
## 5  TRAV26-1.TRAJ53.TRAC_TRBV4-1.None.TRBJ2-2.TRBC2;TRBV19.None.TRBJ1-5.TRBC1
## 8                                              NA_TRBV5-1.None.TRBJ1-2.TRBC1
## 9    TRAV20.TRAJ9.TRAC_TRBV5-1.None.TRBJ1-1.TRBC1;TRBV7-9.None.TRBJ2-2.TRBC2
## 12                            TRAV8-3.TRAJ8.TRAC_TRBV12-4.None.TRBJ1-1.TRBC1
##                                                                                                                                         CTnt
## 1                                                TGCATCGTCAGGGGCGGCTCTAGCAACACAGGCAAACTAATCTTT_TGTGCCAGTATCGGGAGGTCCTTTGGCCGAGATACGCAGTATTTT
## 3                                                   TGTGCTGTGCAACGTTCTAACGACTACAAGCTCAGCTTT_TGTGCCAGCAGCCCCCCCCGCGGCGGATTCACAGATACGCAGTATTTT
## 5        TGCATCGGCTCAAGTGGAGGTAGCAACTATAAACTGACATTT_TGCGCCAGCAGCCAAGGTGGACAGGGCGGAAGGGAGCTGTTTTTT;TGTGCCAGTAGCTACGCGGTGGGGAGGCAGCCCCAGCATTTT
## 8                                                                                              NA_TGCGCCAGCAGCTTGAGGGAAACCAACTATGGCTACACCTTC
## 9  TGTGCTGTGCAGGCCAAGCGGTATACTGGAGGCTTCAAAACTATCTTT_TGCGCCAGCAGCTTGGGAACGGGGACAGGGGTTGAAGCTTTCTTT;TGTGCCATCGATCCGGGACTACTCACCGGGGAGCTGTTTTTT
## 12                                                        TGTGCTGTGGGTGGTGACACAGGCTTTCAGAAACTTGTATTT_TGTGCCAGCAGAAACTCCCAAGCCACTGAAGCTTTCTTT
##                                               CTaa
## 1                  CIVRGGSSNTGKLIF_CASIGRSFGRDTQYF
## 3                   CAVQRSNDYKLSF_CASSPPRGGFTDTQYF
## 5    CIGSSGGSNYKLTF_CASSQGGQGGRELFF;CASSYAVGRQPQHF
## 8                                NA_CASSLRETNYGYTF
## 9  CAVQAKRYTGGFKTIF_CASSLGTGTGVEAFF;CAIDPGLLTGELFF
## 12                    CAVGGDTGFQKLVF_CASRNSQATEAFF
##                                                                                                                                                                                                             CTstrict
## 1                                                                        TRAV26-1.TRAJ37.TRAC;TGCATCGTCAGGGGCGGCTCTAGCAACACAGGCAAACTAATCTTT_TRBV6-1.None.TRBJ2-3.TRBC2;TGTGCCAGTATCGGGAGGTCCTTTGGCCGAGATACGCAGTATTTT
## 3                                                                              TRAV3.TRAJ20.TRAC;TGTGCTGTGCAACGTTCTAACGACTACAAGCTCAGCTTT_TRBV3-1.None.TRBJ2-3.TRBC2;TGTGCCAGCAGCCCCCCCCGCGGCGGATTCACAGATACGCAGTATTTT
## 5      TRAV26-1.TRAJ53.TRAC;TGCATCGGCTCAAGTGGAGGTAGCAACTATAAACTGACATTT_TRBV4-1.None.TRBJ2-2.TRBC2;TRBV19.None.TRBJ1-5.TRBC1;TGCGCCAGCAGCCAAGGTGGACAGGGCGGAAGGGAGCTGTTTTTT;TGTGCCAGTAGCTACGCGGTGGGGAGGCAGCCCCAGCATTTT
## 8                                                                                                                                        NA;NA_TRBV5-1.None.TRBJ1-2.TRBC1;TGCGCCAGCAGCTTGAGGGAAACCAACTATGGCTACACCTTC
## 9  TRAV20.TRAJ9.TRAC;TGTGCTGTGCAGGCCAAGCGGTATACTGGAGGCTTCAAAACTATCTTT_TRBV5-1.None.TRBJ1-1.TRBC1;TRBV7-9.None.TRBJ2-2.TRBC2;TGCGCCAGCAGCTTGGGAACGGGGACAGGGGTTGAAGCTTTCTTT;TGTGCCATCGATCCGGGACTACTCACCGGGGAGCTGTTTTTT
## 12                                                                                 TRAV8-3.TRAJ8.TRAC;TGTGCTGTGGGTGGTGACACAGGCTTTCAGAAACTTGTATTT_TRBV12-4.None.TRBJ1-1.TRBC1;TGTGCCAGCAGAAACTCCCAAGCCACTGAAGCTTTCTTT
##    Type
## 1     B
## 3     B
## 5     B
## 8     B
## 9     B
## 12    B

3.3 exportClones

After assigning the clone by barcode, we can export the paired clonotypes using exportClones() to save for later use or to use in other pipelines.

format

  • “paired” - Export the paired sequences (default).
  • “airr” - Export data in an AIRR-compliant format.
  • “TCRMatch” - Export TCRB chain information.

write.file

  • TRUE, save the file.
  • FALSE, return a data.frame.

dir
directory location to save the csv

file.name
the csv file name

exportClones(combined, 
             write.file = TRUE,
             dir = "~/Documents/MyExperiment/Sample1/"
             file.name = "clones.csv"

4 Basic Clonal Visualizations

cloneCall
* “gene” - use the VDJC genes comprising the TCR/Ig
* “nt” - use the nucleotide sequence of the CDR3 region
* “aa” - use the amino acid sequence of the CDR3 region
* “strict” - use the VDJC genes comprising the TCR + the nucleotide sequence of the CDR3 region. This is the proper definition of clonotype. For combineBCR() strict refers to the edit distance clusters + Vgene of the Ig.

It is important to note that the clonotype is called using essentially the combination of genes or nt/aa CDR3 sequences for both loci. As of this implementation of scRepertoire, clonotype calling is not incorporating small variations within the CDR3 sequences. As such the gene approach will be the most sensitive, while the use of nt or aa is moderately so, and the most specific for clonotypes being strict. Additionally, the clonotype call is trying to incorporate both loci, i.e., both TCRA and TCRB chains and if a single cell barcode has multiple sequences identified (i.e., 2 TCRA chains expressed in one cell). Using the 10x approach, there is a subset of barcodes that only return one of the immune receptor chains. The unreturned chain is assigned an NA value.

4.1 clonalQuant

The first function to explore the clones is clonalQuant() to return the total or relative numbers of unique clones.

scale

  • TRUE - relative percent of unique clones scaled by the total size of the clonal repertoire.
  • FALSE - Report the total number of unique clones (default).

chain

  • “both” for combined chain visualization
  • “TRA”, “TRB”, “TRD”, “TRG”, “IGH” or “IGL” to select single chain
clonalQuant(combined.TCR, 
            cloneCall="strict", 
            chain = "both", 
            scale = TRUE)

Another option here is to be able to define the visualization by data classes. Here, we used the combineTCR() list to define the Type variable as part of the naming structure. We can use the group.by to specifically use a column in the data set to organize the visualization.

clonalQuant(combined.TCR, 
            cloneCall="gene", 
            group.by = "Type", 
            scale = TRUE)

4.2 clonalAbundance

We can also examine the relative distribution of clones by abundance. Here clonalAbundance() will produce a line graph with a total number of clones by the number of instances within the sample or run. Like above, we can also group.by this by vectors within the contig object using the group.by variable in the function.

clonalAbundance(combined.TCR, 
                cloneCall = "gene", 
                scale = FALSE)