Abstract
This Quick-Start is a runnable example showing the functionalities of the SpliceWiz workflow. Version 1.9.0
SpliceWiz is a graphical interface for differential alternative splicing and visualization in R. It differs from other alternative splicing tools as it is designed for users with basic bioinformatic skills to analyze datasets containing up to hundreds of samples! SpliceWiz contains a number of innovations including:
This vignette is a runnable working example of the SpliceWiz workflow. The purpose is to quickly demonstrate the basic functionalities of SpliceWiz.
We provide here a brief outline of the workflow for users to get started as quickly as possible. However, we also provide more details for those wishing to know more. Many sections will contain extra information that can be displayed when clicked on, such as these:
We recommend the following memory requirements (RAM) for running various
steps of SpliceWiz:
buildRef()
processBAM()
collateData()
lowMemoryMode=TRUE
): 32 gigabyteslowMemoryMode=FALSE
): 8 gigabytes per threadDifferential analysis
SpliceWiz defines alternative splicing events (ASEs) as binary events between two possibilities, the included and excluded isoform. It detects and measures: skipped (casette) exons (SE), mutually-exclusive exons (MXE), alternative 5’/3’ splice site usage (A5SS / A3SS), alternate first / last exon usage (AFE / ALE), and retained introns (IR or RI).
SpliceWiz uses splice-specific read counts to measure ASEs. Namely, these are junction reads (reads that align across splice sites). The exception is intron retention (IR) whereby the (trimmed) mean read depth across the intron is measured (identical to the method used in IRFinder).
SpliceWiz provides two metrics:
SpliceOver
or SpliceMax
method (the
latter is identical to that used in IRFinder)
Novel splicing events are those in which at least one isoform is not an
annotated transcript in the given gene annotation. SpliceWiz DOES detect
novel splicing events.
It detects novel events by using novel junctions, using pairs of junctions that originate from or terminate at a common coordinate (novel alternate splice site usage).
Additionally, SpliceWiz detects “tandem junction reads”. These are reads that span across two or more splice junctions. The region between splice junctions can then be annotated as novel exons (if they are not identical to annotated exons). These novel exons can then be used to measure novel casette exon usage.
The basic steps of SpliceWiz are as follows:
To install SpliceWiz, start R (devel version) and enter:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("SpliceWiz")
For those wishing to set up a self-contained environment with SpliceWiz
installed (e.g. on a high performance cluster), we recommend using
miniconda. For installation instructions, see the documentation on
how to install miniconda
After installing miniconda, create a conda environment as follows:
After following the prompts, activate the environment:
Next, install R 4.2.1 as follows:
NB: We have not been able to successfuly use r-base=4.3, so we recommend using r-base=4.2.1 (until further notice).
Many of SpliceWiz’s dependencies are up-to-date from the conda-forge channel, so they are best installed via conda:
conda install -c conda-forge r-devtools r-essentials r-xml r-biocmanager \
r-fst r-plotly r-rsqlite r-rcurl
After this is done, the remainder of the packages need to be installed from the R terminal. This is because most Bioconductor packages are from the bioconda channel and appear not to be routinely updated.
So, lets enter the R terminal from the command line:
Set up Bioconductor 3.16 (which is the latest version compatible with R 4.2):
Again, follow the prompts to update any necessary packages.
Once this is done, install SpliceWiz (devel) from github:
The last step will install any remaining dependencies, taking approximately 20-30 minutes depending on your system.
For MacOS users, make sure OpenMP libraries are
installed correctly. We recommend users follow this guide, but the quickest way
to get started is to install libomp
via brew:
SpliceWiz uses established statistical tools to perform alternative
splicing differential analysis:
To install all of these packages:
NxtIRFdata
data package.
This data package contains the example “chrZ” genome / annotations and 6
example BAM files that are used in this working example. Also,
NxtIRFdata provides pre-generated mappability exclusion annotations for
building human and mouse SpliceWiz references
SpliceWiz offers a graphical user interface (GUI) for interactive users, e.g. in the RStudio environment. To start using SpliceWiz GUI:
SpliceWiz first needs to generate a set of reference files. The
SpliceWiz reference is used to quantitate alternative splicing in BAM
files, as well as in downstream collation, differential analysis and
visualisation.
Using the example FASTA and GTF files, use the
buildRef()
function to build the SpliceWiz reference:
ref_path <- file.path(tempdir(), "Reference")
buildRef(
reference_path = ref_path,
fasta = chrZ_genome(),
gtf = chrZ_gtf(),
ontologySpecies = "Homo sapiens"
)
The SpliceWiz reference can be viewed as data frames using various getter functions. For example, to view the annotated alternative splicing events (ASE):
See ?View-Reference-methods
for a comprehensive list of
getter functions
After starting the SpliceWiz GUI in demo mode, click the
Reference
tab from the menu side bar. The following
interface will be shown:
Load Demo FASTA/GTF
(5), and then click
Build Reference
(6)
The helper functions chrZ_genome()
and
chrZ_gtf()
returns the paths to the example genome (FASTA)
and transcriptome (GTF) file included with the NxtIRFdata
package that contains the working example used by SpliceWiz:
SpliceWiz supports gene ontology analysis. To enable this capability, we
first need to generate the gene ontology annotations for the appropriate
species.
To see a list of supported species:
getAvailableGO()
#> [1] "Anopheles gambiae"
#> [2] "Arabidopsis thaliana"
#> [3] "Bos taurus"
#> [4] "Canis familiaris"
#> [5] "Gallus gallus"
#> [6] "Pan troglodytes"
#> [7] "Escherichia coli"
#> [8] "Drosophila melanogaster"
#> [9] "Homo sapiens"
#> [10] "Mus musculus"
#> [11] "Sus scrofa"
#> [12] "Rattus norvegicus"
#> [13] "Macaca mulatta"
#> [14] "Caenorhabditis elegans"
#> [15] "Xenopus laevis"
#> [16] "Saccharomyces cerevisiae"
#> [17] "Danio rerio"
#> [18] "Triticum aestivum"
#> [19] "Triticum aestivum_subsp._aestivum"
#> [20] "Triticum vulgare"
#> [21] "Brassica napus"
#> [22] "Arachis hypogaea"
#> [23] "Hibiscus syriacus"
#> [24] "Acridium cancellatum"
#> [25] "Schistocerca cancellata"
#> [26] "Triticum dicoccoides"
#> [27] "Triticum turgidum_subsp._dicoccoides"
#> [28] "Triticum turgidum_var._dicoccoides"
#> [29] "Dendrohyas sarda"
#> [30] "Hyla arborea_sarda"
#> [31] "Hyla sarda"
#> [32] "Locusta gregaria"
#> [33] "Schistocerca gregaria"
#> [34] "Gossypium hirsutum"
#> [35] "Gossypium hirsutum_subsp._mexicanum"
#> [36] "Gossypium lanceolatum"
#> [37] "Gossypium purpurascens"
#> [38] "Camelina sativa"
#> [39] "Myagrum sativum"
#> [40] "Carassius auratus_gibelio"
#> [41] "Carassius gibelio_gibelio"
#> [42] "Carassius gibelio"
#> [43] "Carassius gibelio_subsp._gibelio"
#> [44] "Cyprinus gibelio"
#> [45] "Schistocerca piceifrons"
#> [46] "Papaver somniferum"
#> [47] "Zingiber officinale"
#> [48] "Trichomonas vaginalis_G3"
#> [49] "Trichomonas vaginalis_strain_G3"
#> [50] "Helianthus annuus"
#> [51] "Schistocerca americana"
#> [52] "Acipenser ruthenus"
#> [53] "Schistocerca serialis_cubense"
#> [54] "Panicum virgatum"
#> [55] "Nicotiana tabacum"
#> [56] "Oncorhynchus mykiss"
#> [57] "Oncorhynchus nerka_mykiss"
#> [58] "Parasalmo mykiss"
#> [59] "Salmo mykiss"
#> [60] "Schistocerca nitens"
#> [61] "Schistocerca vaga"
#> [62] "Salvia splendens"
#> [63] "Carassius carassius"
#> [64] "Cyprinus carassius"
#> [65] "Vicia villosa"
#> [66] "Camellia sinensis"
#> [67] "Thea sinensis"
#> [68] "Oncorhynchus keta"
#> [69] "Salmo keta"
#> [70] "Pisum sativum"
#> [71] "Salmo salar"
#> [72] "Raphanus sativus"
#> [73] "Oncorhynchus kisutch"
#> [74] "Oncorhyncus kisutch"
#> [75] "Salmo kisatch"
#> [76] "Lolium rigidum"
#> [77] "Aegilops squarrosa_subsp._squarrosa"
#> [78] "Aegilops squarrosa"
#> [79] "Aegilops tauschii"
#> [80] "Patropyrum tauschii_subsp._tauschii"
#> [81] "Patropyrum tauschii"
#> [82] "Triticum aegilops"
#> [83] "Triticum tauschii"
#> [84] "Salmo trutta"
#> [85] "Cryptomeria japonica"