Contents

1 Introduction

Welcome to RiboCrypt RiboCrypt is an R package for interactive visualization in genomics. RiboCrypt works with any NGS-based method, but much emphasis is put on Ribo-seq data visualization.

This tutorial will walk you through usage of the app.

RibCrypt app currently supports creating interactive browser views for NGS tracks, using ORFik, Ribocrypt and massiveNGSpipe as backend.

2 Browser

The browser is the main coverage plot display page. It contains a click panel on the left side and display panels on the right. It displays coverage of NGS data in either transcript coordinates (default), or genomic coordinates (like IGV). Each part will now be explained:

2.1 Display panel (browser)

The display panel shows the primary settings, (study, gene, sample, etc), the possible select boxes are:

2.1.1 Experiment selection

  • Select an organism: Either select “ALL” to keep all experiments, or select a specific organism to select display only that subset of experiments in experiment select tab.
  • Select an experiment: The experiments contain study names combined with organism (some studies are multi species, so sometimes one study have multiple experiments). Select which one you want. There also exist merged experiments (all samples merged for the organism, etc)

2.1.2 Gene selection

  • Select a gene: A gene can be selected currently using:
    • Gene id (ENSEMBL)
    • Gene symbol (hgnc, etc)
  • Select a transcript: A transcript isoform of the given gene above, default is Ensembl canonical isoform. Can be selected using:
    • Transcript id (ENSEMBL)

2.1.3 Library selection

Each experiment usually have multiple libraries. Select which one to display, by default if you select multiple libraries they will be shown under each other.

Library are by default named:

  • Library type (RFP, RNA etc),
  • Condition (WT, KO (wild type, knock out ) etc)
  • Stage/timepoint (5h, 1d (5 hours, 1 day) etc)
  • fraction (chx, cytosolic, ATF4 (ribosomal inhibitor, cell fraction, gene) etc)
  • replicate (technical/biological replicate number (r1, r2, r3))

The resuting name above could be:

  • RFP_WT_5h_chx_cytosolic_r1

A normal thing to see is that if condition is KO (knockout), the fraction column usually contains a gene name (the name of the gene that was knocked out) Currently, best way to find SRR run number for respective sample is to go to metadata tab and search for the study.

2.1.4 View mode

  • Select frames display type:
    • lines (single line, most clear for middle distance (> 100 nt))
    • columns (single point bars, most clear for single nt resolution)
    • stacks (Area under curve, stacked, most clear for long distance (> 1000 nt))
    • area (Area under curve, with alpha (see-through), most clear for long distance (> 1000 nt))
  • K-mer length: When looking at a large region (> 100nt), pure coverage can usually be hard to inspect. Using K-mer length > 1 (9 is a good starting point to try), you can easily look at patterns over larger regions.

2.2 Display panel (settings)

Here additional options are shown:

  • 5’ extension (extend viewed window upstream, outside defined region)
  • 3’ extension (extend viewed window downstream, outside defined region)
  • Genomic View (Activate/deactivate genomic view, giving splice information and correct positions in genome, but a lot harder to understand)
  • Protein structures (If you click the annotation name of a transcript in the plot panel it will display the alpha-fold protein colored by the ribo-seq data displayed in the plot panel)
  • Full annotation (display full annotation or just the tx you selected)
  • Summary top track (Add an additional plot track on top, summarizing all selecte libs)
  • Select Summary display type (same as frames display type above, but for the summary track)
  • Export format (When you hover the plot top right image button, and click export (the camera button), which format to export as)

2.3 Plot panel

From the options specified in the display panel, when you press “plot” the data will be displayed. It contains the specific parts:

  1. Ribo-seq data (top), the single or multi-track data is displayed on top. By default Ribo-seq is displayed in 3 colors, where
  • red is 0 frame, the start frame of reference transcript.
  • green is +1 frame
  • blue is +2 frame
  1. Sequence track (top middle), displayes DNA sequence when zoomed in (< 100nt)
  2. Annotation track (middle), the annotation track displays the transcript annotation, together with black bars that is displayed on top of the data track.
  3. Frame track (bottom), the 3 frames displayed with given color bars:
  • white (Start codons)
  • black (Stop codons)
  • purple (Custom motifs) When zoomed in, the amino acid sequence is displaced within each frame

3 Analysis

Here we collect the analysis possibilities, which are usually on whole genome scale.

3.1 Codon analysis

This tab displays a heatmap of percentage usage of codons over all genes selected, for both A and P sites.

3.1.1 Display panel (codon)

Study and gene select works same as for browser specified above. In addition to have the option to specify all genes (default). - Select libraries (multiple allowed)

3.1.1.1 Filters

  • Codon filter value (Minimum reads in ORF to be included)
  • Codon score, all scores are normalized for both codon and count per gene level (except for sum):
    • percentage (percentage use relative to max codon)
    • dispersion(NB) (negative binomial dispersion values)
    • alpha(DMN) (Dirichlet-multinomial distribution alpha parameter)
    • sum (raw sum, (a very biased estimator, since some codons are used much more than others!))

3.2 Heatmap

This tab displays a heatmap of coverage per readlength at a specific region (like start site of coding sequences) over all genes selected.