Contents

1 Introduction

A sequence logo has been widely used as a graphical representation of nucleic acid sequence motifs. There is a R package seqlogo(Bembom 2006) for drawing sequence logos for a single DNA motif. There is another R package motifStack(Ou et al. 2018) for depicting individual sequence motif as well as multiple motifs for amino acid (AA), DNA and RNA sequences.

IceLogo(Colaert et al. 2009) is a tool developed in Java for visualizing significantly conserved sequence patterns in a list of aligned AA sequences against a set of background sequences. Compared to webLogo(Crooks et al. 2004), which relies on information theory, IceLogo builds on probability theory. It is reported that IceLogo has a more dynamic nature and is more appropriate for analysis of conserved sequence patterns.

However, IceLogo can only compare conserved sequences to reference sequences at the individual amino acid level. As we know, some conserved sequence patterns are not conserved at the individual amino acid level, but conserved at the level of amino acid group characteristic of their physical and chemical properties, such as charge and hydrophobicity.

Here we developed a R/Bioconductor package dagLogo, for visualizing significantly conserved sequence patterns relative to a proper background set of sequences, with or without grouping amino acid residuals based on their physical and chemical properties. Figure 1 shows the flowchart of performing analysis using dagLogo. Comparing to existing tools, dagLogo allows aligned or not aligned subsequences of different length as input; Provides more options and functions to generate various background sets that can be tailored to fit the experimental design; Both significantly over- and under-represented amino acid residues can be plotted; AA residues can be grouped and statistical significance test can be performed at the group level.

Figure 1. Flowchart of performing analysis using dagLogo. Two ways to prepare an object of Proteome are colored in greenish and yellowish, while two alternative ways to build an object of dagPeptides are colored in blue and red..

Figure 1. Flowchart of performing analysis using dagLogo. Two ways to prepare an object of Proteome are colored in greenish and yellowish, while two alternative ways to build an object of dagPeptides are colored in blue and red.

3 Session Info

sessionInfo()
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.18-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    grid      stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] Biostrings_2.70.0   GenomeInfoDb_1.38.0 XVector_0.42.0     
##  [4] IRanges_2.36.0      S4Vectors_0.40.0    motifStack_1.46.0  
##  [7] UniProt.ws_2.42.0   RSQLite_2.3.1       BiocGenerics_0.48.0
## [10] biomaRt_2.58.0      dagLogo_1.40.0      BiocStyle_2.30.0   
## 
## loaded via a namespace (and not attached):
##   [1] DBI_1.1.3                   bitops_1.0-7               
##   [3] rlang_1.1.1                 magrittr_2.0.3             
##   [5] ade4_1.7-22                 matrixStats_1.0.0          
##   [7] compiler_4.3.1              reshape2_1.4.4             
##   [9] png_0.1-8                   vctrs_0.6.4                
##  [11] stringr_1.5.0               pkgconfig_2.0.3            
##  [13] crayon_1.5.2                fastmap_1.1.1              
##  [15] magick_2.8.1                dbplyr_2.3.4               
##  [17] caTools_1.18.2              utf8_1.2.4                 
##  [19] Rsamtools_2.18.0            rmarkdown_2.25             
##  [21] pracma_2.4.2                tzdb_0.4.0                 
##  [23] DirichletMultinomial_1.44.0 bit_4.0.5                  
##  [25] xfun_0.40                   zlibbioc_1.48.0            
##  [27] cachem_1.0.8                CNEr_1.38.0                
##  [29] jsonlite_1.8.7              progress_1.2.2             
##  [31] blob_1.2.4                  DelayedArray_0.28.0        
##  [33] BiocParallel_1.36.0         jpeg_0.1-10                
##  [35] parallel_4.3.1              prettyunits_1.2.0          
##  [37] R6_2.5.1                    bslib_0.5.1                
##  [39] stringi_1.7.12              RColorBrewer_1.1-3         
##  [41] rtracklayer_1.62.0          GenomicRanges_1.54.0       
##  [43] jquerylib_0.1.4             Rcpp_1.0.11                
##  [45] bookdown_0.36               SummarizedExperiment_1.32.0
##  [47] knitr_1.44                  base64enc_0.1-3            
##  [49] R.utils_2.12.2              readr_2.1.4                
##  [51] BiocBaseUtils_1.4.0         Matrix_1.6-1.1             
##  [53] tidyselect_1.2.0            abind_1.4-5                
##  [55] yaml_2.3.7                  codetools_0.2-19           
##  [57] curl_5.1.0                  rjsoncons_1.0.0            
##  [59] plyr_1.8.9                  lattice_0.22-5             
##  [61] tibble_3.2.1                Biobase_2.62.0             
##  [63] KEGGREST_1.42.0             evaluate_0.22              
##  [65] BiocFileCache_2.10.0        xml2_1.3.5                 
##  [67] pillar_1.9.0                BiocManager_1.30.22        
##  [69] filelock_1.0.2              MatrixGenerics_1.14.0      
##  [71] generics_0.1.3              grImport2_0.3-0            
##  [73] RCurl_1.98-1.12             hms_1.1.3                  
##  [75] ggplot2_3.4.4               munsell_0.5.0              
##  [77] scales_1.2.1                xtable_1.8-4               
##  [79] gtools_3.9.4                glue_1.6.2                 
##  [81] pheatmap_1.0.12             seqLogo_1.68.0             
##  [83] tools_4.3.1                 TFMPvalue_0.0.9            
##  [85] BiocIO_1.12.0               BSgenome_1.70.0            
##  [87] annotate_1.80.0             GenomicAlignments_1.38.0   
##  [89] XML_3.99-0.14               Cairo_1.6-1                
##  [91] poweRlaw_0.70.6             TFBSTools_1.40.0           
##  [93] AnnotationDbi_1.64.0        colorspace_2.1-0           
##  [95] GenomeInfoDbData_1.2.11     restfulr_0.0.15            
##  [97] cli_3.6.1                   rappdirs_0.3.3             
##  [99] fansi_1.0.5                 S4Arrays_1.2.0             
## [101] dplyr_1.1.3                 gtable_0.3.4               
## [103] R.methodsS3_1.8.2           sass_0.4.7                 
## [105] digest_0.6.33               SparseArray_1.2.0          
## [107] rjson_0.2.21                htmlwidgets_1.6.2          
## [109] R.oo_1.25.0                 memoise_2.0.1              
## [111] htmltools_0.5.6.1           lifecycle_1.0.3            
## [113] httr_1.4.7                  GO.db_3.18.0               
## [115] bit64_4.0.5                 MASS_7.3-60

Bembom, Oliver. 2006. “SeqLogo: Sequence Logos for Dna Sequence Alignments.” R Package Version 1.5.4.

Colaert, Niklaas, Kenny Helsens, Lennart Martens, Joel Vandekerckhove, and Kris Gevaert. 2009. “Improved Visualization of Protein Consensus Sequences by iceLogo.” Nature Methods 6 (11): 786–87.

Crooks, Gavin E., Gary Hon, John-Marc Chandonia, and Steven E. Brenner. 2004. “WebLogo: A Sequence Logo Generator.” Genome Research 14: 1188–90.

Ou, Jianhong, Scot A Wolfe, Michael H Brodsky, and Lihua Julie Zhu. 2018. “MotifStack for the Analysis of Transcription Factor Binding Site Evolution.” Nature Methods 15 (1): 8.