1 Introduction

The sitePath package does hierarchical search for fixation events given multiple sequence alignment and phylogenetic tree. These fixation events can be specific to a phylogenetic lineages or shared by multiple lineages. This is achieved by three major steps:

  1. Import tree and sequence alignment
  2. Resolve phylogenetic lineages
  3. Hierarchical search for lineage-dependent fixation events

2 Import tree and sequence alignment

There’re various R packages for parsing phylogenetic tree and multiple sequence alignment files. For now, sitepath accepts phylo object and alignment object. Functions from ggtree and seqinr are able to handle most file formats.

2.1 Parse phylogenetic tree

The S3 phylo class is a common data structure for phylogenetic analysis in R. The CRAN package ape provides basic parsing function for reading tree files. The Bioconductor package ggtree provides more comprehensive parsing utilities.

library(ape)

tree <- read.tree(system.file("extdata", "ZIKV.newick", package = "sitePath"))

It is highly recommended that the file stores a rooted tree as R would consider the tree is rooted by default and re-rooting the tree in R is difficult.

2.2 Parse and add sequence alignment

Most multiple sequence alignment format can be parsed by seqinr. sitePath has a wrapper function for parsing and adding the sequence alignment.

library(sitePath)

alignment_file <- system.file("extdata", "ZIKV.fasta", package = "sitePath")
tree <- addMSA(tree, alignment_file, "fasta")

3 Resolve phylogenetic lineages

The names in tree and aligment must be matched. We use a tip-to-root algorithm to trim tree leaves/tips to expose the major branches. Before finding putative phylogenetic lineages, there involves a few more steps to evalute the impact of threshold on result.

3.1 The impact of threshold on resolving lineages

In the current version, the resolving function only takes sequence similarity as one single threshold. The impact of threshold depends on the tree topology hence there is no universal choice. The function sneakPeak samples thresholds and calculates the resulting number of paths. The use of this function can be of great help in choosing the threshold.

preassessment <- sneakPeek(tree)
plot(preassessment$similarity, preassessment$pathNum)

3.2 Choose a threshold by yourself

Use the function lineagePath to get a S3 sitePath class object1 The S3 sitePath object contains the nodes to represent phylogenetic lineages for downstream analysis. The choice of the threshold really depends. You can use the result from sneakPeak as a reference for threhold choosing. Here we choose 0.995 as an example.

paths <- lineagePath(tree, 0.995)
paths
#> 2 paths

You can visualize the result.

plot(paths, no.margin = TRUE)

4 Hierarchical search for fixation events

Now you’re ready to find fixation events.

4.1 Fixation mutations

The hierarchical search is done by fixationSites function. The function outputs a list of mutations with the sequence names involved before and after the fixation.

fixations <- fixationSites(paths)
fixations
#> Result for 2 paths:
#> 
#> 107 109 139 894 988 1118 1143 2074 2086 2634 2842 3045 3144 3328 3353 3398 
#> No reference sequence specified. Using alignment numbering

If you want to retrieve the tip names involved in the fixation of a site, you can pass the result of fixationSites and the site index to extractSite function. The output is a sitePath object which stores the tip names.

extractSite(fixations, 139)
#> Site 139 may experience fixation on 1 path(s):
#> 
#> S139N (114->243) 
#> 
#> In the bracket are the number of tips involved before and after the fixation

It is also possible to retrieve the tips involved in the fixation of the site.

extractTips(fixations, 139)
#> [[1]]
#>   [1] "ANK57896" "AMD61711" "AQS26698" "APG56458" "AUI42289" "AMR39834"
#>   [7] "AWH65848" "APO08504" "AMX81917" "AVZ47169" "AMX81916" "BBC70847"
#>  [13] "AUF35022" "ATL14618" "AUF35021" "AVG19202" "APG56499" "ASW34302"
#>  [19] "APH11523" "APH11495" "APH11549" "APH11505" "APH11535" "APH11565"
#>  [25] "APH11550" "APH11501" "APH11583" "APH11584" "APH11524" "APH11544"
#>  [31] "APH11504" "APH11587" "APH11527" "APH11533" "APH11558" "APH11586"
#>  [37] "APH11576" "APH11542" "APH11564" "APH11555" "APH11585" "APH11582"
#>  [43] "APH11521" "APH11567" "APH11577" "APH11547" "APH11531" "APH11517"
#>  [49] "APH11548" "APH11561" "APH11553" "APH11515" "APH11554" "APH11540"
#>  [55] "APH11532" "APH11537" "APH11509" "APH11514" "APH11568" "APH11574"
#>  [61] "APH11575" "APH11572" "APH11545" "APH11516" "APH11552" "AOL02459"
#>  [67] "APH11569" "APH11541" "APH11511" "AOO19565" "APH11518" "APH11512"
#>  [73] "APH11593" "APH11546" "APH11539" "APH11534" "APH11508" "APH11581"
#>  [79] "APH11525" "AWK27349" "APH11570" "APH11563" "APH11530" "APH11580"
#>  [85] "APH11579" "APH11557" "APH11594" "APH11559" "APH11592" "APH11590"
#>  [91] "APH11538" "APH11502" "APH11499" "APH11507" "APH11573" "APH11543"
#>  [97] "APH11529" "APH11528" "APH11497" "APH11503" "APH11500" "APH11536"
#> [103] "APH11493" "APH11566" "APH11560" "APH11551" "APH11571" "APH11562"
#> [109] "APH11595" "APH11498" "APH11506" "APH11578" "AVV62004" "BAX00477"
#> attr(,"AA")
#> [1] "S"
#> 
#> [[2]]
#>   [1] "BAV89190"     "AOI20067"     "AMM43325"     "AMM43326"    
#>   [5] "AUI42329"     "AUI42330"     "ANC90425"     "AMT75536"    
#>   [9] "ANF16414"     "AMR68932"     "ANA12599"     "AMM39806"    
#>  [13] "AMR39830"     "AMV49165"     "AMO03410"     "ANO46307"    
#>  [17] "AVG19275"     "ANN44857"     "ANO46306"     "ANO46309"    
#>  [21] "ANO46305"     "ANO46303"     "ARB08102"     "ANO46302"    
#>  [25] "AHZ13508"     "ANO46304"     "ANO46301"     "ANO46308"    
#>  [29] "AOG18296"     "AOO19564"     "AUI42194"     "APC60215"    
#>  [33] "AMQ48986"     "ATG29307"     "ART29828"     "AWF93617"    
#>  [37] "ATG29284"     "ATG29287"     "ATG29303"     "AWF93619"    
#>  [41] "AWF93618"     "AQM74762"     "AUD54964"     "AQM74761"    
#>  [45] "ATG29306"     "ASL68974"     "ATG29267"     "ASL68978"    
#>  [49] "AQX32985"     "ATG29315"     "AQZ41956"     "ARI68105"    
#>  [53] "ASU55505"     "AQZ41949"     "ASL68979"     "ATG29299"    
#>  [57] "ATI21641"     "ATG29270"     "ATG29291"     "AOY08536"    
#>  [61] "ANO46297"     "ANO46298"     "AQZ41950"     "AQZ41951"    
#>  [65] "ARU07183"     "ANG09399"     "AQZ41954"     "AOY08533"    
#>  [69] "AQZ41947"     "AQZ41948"     "ATG29292"     "ATG29295"    
#>  [73] "AOW32303"     "AVZ25033"     "AOC50654"     "AQZ41953"    
#>  [77] "ATG29301"     "ATG29276"     "APO08503"     "AMC13913"    
#>  [81] "AMC13912"     "APO39243"     "APO39229"     "AQZ41952"    
#>  [85] "AQZ41955"     "AMK49165"     "ARB07976"     "APB03018"    
#>  [89] "AMC13911"     "APB03019"     "ASU55416"     "ANK57897"    
#>  [93] "AWH65849"     "AMZ03556"     "ASU55417"     "ANW07476"    
#>  [97] "APY24199"     "AMA12086"     "AMH87239"     "APY24198"    
#> [101] "APO36913"     "ALX35659"     "AOG18295"     "ANQ92019"    
#> [105] "AML81028"     "APY24200"     "AMD16557"     "ARU07074"    
#> [109] "AOX49264"     "AOX49265"     "AOY08518"     "ARB07962"    
#> [113] "AMX81919"     "AMM39805"     "ARX97119"     "AMB37295"    
#> [117] "AMK79468"     "AML82110"     "AMR39831"     "AMX81918"    
#> [121] "ANC90426"     "ALU33341"     "ASB32509"     "AMA12085"    
#> [125] "AMU04506"     "AMA12087"     "AMA12084"     "AQU12485"    
#> [129] "AMS00611"     "AMQ48981"     "AOY08538"     "APH11492"    
#> [133] "AOY08517"     "AOY08541"     "AOO54270"     "AND01116"    
#> [137] "ARB07967"     "ANF04752"     "AOE22997"     "APQ41782"    
#> [141] "APQ41786"     "ASU55393"     "ASU55404"     "ASU55423"    
#> [145] "ANB66182"     "ASU55425"     "ASU55420"     "AQX32986"    
#> [149] "ASU55422"     "APQ41784"     "ANC90428"     "ASU55415"    
#> [153] "ASU55418"     "ARM59239"     "ASU55408"     "ASU55424"    
#> [157] "ASU55390"     "ASU55419"     "ASU55391"     "AMM39804"    
#> [161] "ASU55411"     "ANB66183"     "ASU55421"     "AMZ03557"    
#> [165] "ASU55392"     "AQX32987"     "ASU55403"     "ASU55399"    
#> [169] "APQ41783"     "ANS60026"     "ANB66184"     "ASU55426"    
#> [173] "ASU55412"     "ASU55413"     "ASU55410"     "ASU55397"    
#> [177] "ASU55400"     "ASU55409"     "APB03017"     "ASU55395"    
#> [181] "ASU55396"     "AOY08524"     "ASU55394"     "ASU55414"    
#> [185] "ASU55405"     "AMC33116"     "ASU55406"     "ASU55398"    
#> [189] "ASU55407"     "AMQ34003"     "AMQ34004"     "ASU55401"    
#> [193] "ASU55402"     "ARU07076"     "AMK49164"     "APG56457"    
#> [197] "AOR82892"     "ATB53752"     "ANH10698"     "AOR82893"    
#> [201] "ARU07075"     "AMB18850"     "YP_009428568" "AMQ48982"    
#> [205] "ART29823"     "ASK51714"     "ARB07953"     "APW84876"    
#> [209] "APW84872"     "APW84873"     "AOY08535"     "AVZ25035"    
#> [213] "AOY08537"     "AMN14620"     "APW84874"     "APW84875"    
#> [217] "BAV82373"     "AOS90224"     "ANH22038"     "APW84877"    
#> [221] "AOY08525"     "ASW34087"     "ART29826"     "ART29825"    
#> [225] "AOY08523"     "AOY08542"     "ARB07932"     "APB03020"    
#> [229] "AOS90220"     "APO39232"     "AOS90223"     "APO39237"    
#> [233] "AOY08546"     "AOY08516"     "AOS90221"     "APB03021"    
#> [237] "APO39236"     "APO39233"     "AOS90222"     "AOO53981"    
#> [241] "AOY08521"     "AOO85388"     "APO39228"    
#> attr(,"AA")
#> [1] "N"

4.2 Visualize the result

Use plot to visualize each fixation mutation. You’ll have to specify which mutation to be visualized. Because the names(fixationSites) are the mutation names, you can also use one of them as function input

plotSingleSite(fixations, 139)

5 Additional functions

This part is extra and experimental but might be useful when pre-assessing your data. We’ll use an example to demonstrate.

5.1 Inspect one site

The plotSingleSite function will color the tree according to amino acids if you use the output of lineagePath function.

plotSingleSite(paths, 139)

plotSingleSite(paths, 763)

5.2 SNP sites

An SNP site could potentially undergo fixation event. The SNPsites function predicts possible SNP sites and the result could be what you’ll expect to be fixation mutation.

snps <- SNPsites(tree)
plotSingleSite(paths, snps[4])