Contents

1 Introduction

Lefser is metagenomic biomarker discovery tool that is based on LEfSe tool and is published by Huttenhower et al. 2011. Lefser is the R implementation of the LEfSe method.

Using statistical analyses, lefser compares microbial populations of healthy and diseased subjects to discover differencially expressed microorganisms. Lefser than computes effect size, which estimates magnitude of differential expression between the populations for each differentially expressed microorganism. Subclasses of classes can also be assigned and used within the analysis.

2 Installation

To install Bioconductor and the lefser package, run the following commands.

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("lefser")

Then load the lefser package.

library(lefser)

3 Overview and example use of lefser

The lefser function can be used with a SummarizedExperiment.

Load the zeller14 example dataset and exclude ‘adenoma’ conditions.

data(zeller14)
zeller14 <- zeller14[, zeller14$study_condition != "adenoma"]

Note. lefser supports only two-group contrasts.

The colData in the SummarizedExperiment dataset contains the grouping column study_condition which includes the ‘control’ and ‘CRC’ groups.

table(zeller14$study_condition)
## 
##     CRC control 
##      91      66

There can be subclasses in each group condition. In the example dataset we include age_category as a subclass of study_condition which includes ‘adults’ and ‘seniors’. This variable will correspond to the blockCol input argument.

table(zeller14$age_category)
## 
##  adult senior 
##     91     66

We can create a contingency table for the two categorical variables.

table(zeller14$age_category, zeller14$study_condition)
##         
##          CRC control
##   adult   45      46
##   senior  46      20

We can now use the lefser function. It provides results as a data.frame with the names of selected microorganisms and their effect size.

res <- lefser(zeller14, groupCol = "study_condition", blockCol = "age_category")
## Warning in lefser(zeller14, groupCol = "study_condition", blockCol =
## "age_category"): Convert counts to relative abundances with 'relativeAb()'
## The outcome variable is specified as 'study_condition' and the reference category is 'CRC'.
##  See `?factor` or `?relevel` to change the reference category.
head(res)
##                                                                                                                                                                                Names
## 1                                                                               k__Bacteria|p__Fusobacteria|c__Fusobacteriia|o__Fusobacteriales|f__Fusobacteriaceae|g__Fusobacterium
## 2                                                                                                k__Bacteria|p__Fusobacteria|c__Fusobacteriia|o__Fusobacteriales|f__Fusobacteriaceae
## 3                                                                                k__Bacteria|p__Bacteroidetes|c__Bacteroidia|o__Bacteroidales|f__Porphyromonadaceae|g__Porphyromonas
## 4                                               k__Bacteria|p__Bacteroidetes|c__Bacteroidia|o__Bacteroidales|f__Porphyromonadaceae|g__Porphyromonas|s__Porphyromonas_asaccharolytica
## 5 k__Bacteria|p__Bacteroidetes|c__Bacteroidia|o__Bacteroidales|f__Porphyromonadaceae|g__Porphyromonas|s__Porphyromonas_asaccharolytica|t__Porphyromonas_asaccharolytica_unclassified
## 6                                                                                                                                       k__Bacteria|p__Fusobacteria|c__Fusobacteriia
##      scores
## 1 -5.270552
## 2 -5.222674
## 3 -5.046829
## 4 -4.952439
## 5 -4.900934
## 6 -4.882968

4 Visualizing results with lefserPlot

lefserPlot(res)

5 Interoperating with phyloseq

When using phyloseq objects, we recommend to extract the data and create a SummarizedExperiment object as follows:

library(phyloseq)
## 
## Attaching package: 'phyloseq'
## The following object is masked from 'package:SummarizedExperiment':
## 
##     distance
## The following object is masked from 'package:Biobase':
## 
##     sampleNames
## The following object is masked from 'package:GenomicRanges':
## 
##     distance
## The following object is masked from 'package:IRanges':
## 
##     distance
fp <- system.file(
    "extdata", "study_1457_split_library_seqs_and_mapping.zip",
    package = "phyloseq"
)
kostic <- suppressWarnings({
    microbio_me_qiime(fp)
})
## Found biom-format file, now parsing it... 
## Done parsing biom... 
## Importing Sample Metdadata from mapping file...
## Merging the imported objects... 
## Successfully merged, phyloseq-class created. 
##  Returning...
counts <- unclass(otu_table(kostic))
colData <- as(sample_data(kostic), "data.frame")
## create a SummarizedExperiment object
SummarizedExperiment(
    assays = list(counts = counts), colData = colData
)
## class: SummarizedExperiment 
## dim: 2505 190 
## metadata(0):
## assays(1): counts
## rownames(2505): 304309 469478 ... 206906 298806
## rowData names(0):
## colnames(190): C0333.N.518126 C0333.T.518046 ... 32I9UNA9.518098
##   BFJMKNMP.518102
## colData names(71): X.SampleID BarcodeSequence ... HOST_TAXID
##   Description

You may also consider using makeTreeSummarizedExperimentFromPhyloseq from the mia package (example not shown).

5.1 sessionInfo

sessionInfo()
## R Under development (unstable) (2024-01-16 r85808)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] phyloseq_1.47.0             lefser_1.13.4              
##  [3] SummarizedExperiment_1.33.3 Biobase_2.63.0             
##  [5] GenomicRanges_1.55.3        GenomeInfoDb_1.39.8        
##  [7] IRanges_2.37.1              S4Vectors_0.41.4           
##  [9] BiocGenerics_0.49.1         MatrixGenerics_1.15.0      
## [11] matrixStats_1.2.0           BiocStyle_2.31.0           
## 
## loaded via a namespace (and not attached):
##  [1] ade4_1.7-22             tidyselect_1.2.0        libcoin_1.0-10         
##  [4] dplyr_1.1.4             farver_2.1.1            Biostrings_2.71.2      
##  [7] fastmap_1.1.1           TH.data_1.1-2           digest_0.6.34          
## [10] lifecycle_1.0.4         cluster_2.1.6           survival_3.5-8         
## [13] magrittr_2.0.3          compiler_4.4.0          rlang_1.1.3            
## [16] sass_0.4.8              tools_4.4.0             igraph_2.0.2           
## [19] utf8_1.2.4              yaml_2.3.8              data.table_1.15.2      
## [22] knitr_1.45              S4Arrays_1.3.6          labeling_0.4.3         
## [25] DelayedArray_0.29.9     plyr_1.8.9              multcomp_1.4-25        
## [28] abind_1.4-5             withr_3.0.0             grid_4.4.0             
## [31] fansi_1.0.6             multtest_2.59.0         biomformat_1.31.0      
## [34] colorspace_2.1-0        Rhdf5lib_1.25.1         ggplot2_3.5.0          
## [37] scales_1.3.0            iterators_1.0.14        MASS_7.3-60.2          
## [40] cli_3.6.2               mvtnorm_1.2-4           vegan_2.6-4            
## [43] rmarkdown_2.26          crayon_1.5.2            generics_0.1.3         
## [46] reshape2_1.4.4          rhdf5_2.47.5            ape_5.7-1              
## [49] cachem_1.0.8            stringr_1.5.1           zlibbioc_1.49.0        
## [52] modeltools_0.2-23       splines_4.4.0           parallel_4.4.0         
## [55] BiocManager_1.30.22     XVector_0.43.1          vctrs_0.6.5            
## [58] Matrix_1.6-5            sandwich_3.1-0          jsonlite_1.8.8         
## [61] bookdown_0.38           magick_2.8.3            foreach_1.5.2          
## [64] jquerylib_0.1.4         glue_1.7.0              codetools_0.2-19       
## [67] stringi_1.8.3           gtable_0.3.4            munsell_0.5.0          
## [70] tibble_3.2.1            pillar_1.9.0            rhdf5filters_1.15.2    
## [73] htmltools_0.5.7         GenomeInfoDbData_1.2.11 R6_2.5.1               
## [76] evaluate_0.23           lattice_0.22-5          highr_0.10             
## [79] bslib_0.6.1             Rcpp_1.0.12             permute_0.9-7          
## [82] nlme_3.1-164            SparseArray_1.3.4       mgcv_1.9-1             
## [85] coin_1.4-3              xfun_0.42               zoo_1.8-12             
## [88] pkgconfig_2.0.3