Contents

Abstract

autonomics is created to make cross-platform omics analysis intuitive and enjoyable. It contains import + preprocess + analyze + visualize one-liners for RNAseq, MS proteomics, SOMAscan and Metabolon platforms and a generic importer for data from any rectangular omics file. With a focus on not only automation but also customization, these importers have a flexible formula/ block/contrastdefs interface which allows to define any statistical formula, fit any general linear model, quantify any contrast, and use random effects or precision weights if required, building on top of the powerful limma platform for statistical analysis. It also offers exquisite support for analyzing complex designs such as the innovative contrastogram visualization to unravel and summarize the differential expression structure in complex designs. By autonomating repetitive tasks, generifying common steps, and intuifying complex designs, it makes cross-platform omics data analysis a fun experience. Try it and enjoy :-).

Autonomics offers import + preprocess + analyze + visualize one-liners for RNAseq, MS proteomics, SOMAscan and Metabolon platforms, as well a generic importer for data from any rectangular omics file. We will discuss each of these in more detail in the following sections, but all of them follow the following general steps:

1 RNAseq

Autonomics provides ready-to-use importers for both count as well as BAM files, which read / preprocess / analyze RNAseq data. Specific to RNAseq is counting reads using Rsubread::featureCounts (for read_rna_bams)), normalizing for library composition (cpm) or gene length (tpm), estimating voom precision weights, and using these in linear modeling. Let’s illustrate both of these on example datasets:

read_rnaseq_counts

Billing et al. (2019) studied the differentiation of embryonic (E00) into mesenchymal stem cells (E00.8, E01, E02, E05, E15, E30), with similar properties as bone-marrow mesenchymal stem cells (M00). Importing the RNAseq counts:

require(autonomics, quietly = TRUE)
## 
## Attaching package: 'autonomics'
## The following objects are masked from 'package:stats':
## 
##     biplot, loadings
file <- system.file('extdata/billing19.rnacounts.txt', package = 'autonomics')
object <-  read_rnaseq_counts(file = file, fit = 'limma', plot = TRUE, label = 'gene')
##  Read /tmp/RtmphdRJQu/Rinst171f806f813fd2/autonomics/extdata/billing19.rnacounts.txt
##               Infer subgroup from sample_ids
##               Infer subgroup from sample_ids
##      Preprocess
##          Keep 24/24 features: count >= 10 (~subgroup)
##          counts: add 0.5
##          cpm:    tmm scale libsizes
##              cpm
##          voom: ~subgroup
##          counts: rm 0.5
##               Add PCA
##     LinMod
##               Code subgroup
##                                level
##                     coefficient E00 E00.8 E01 E02 E05 E15 E30 M00
##                       Intercept  1   .     .   .   .   .   .   . 
##                       E00.8     -1   1     .   .   .   .   .   . 
##                       E01       -1   .     1   .   .   .   .   . 
##                       E02       -1   .     .   1   .   .   .   . 
##                       E05       -1   .     .   .   1   .   .   . 
##                       E15       -1   .     .   .   .   1   .   . 
##                       E30       -1   .     .   .   .   .   1   . 
##                       M00       -1   .     .   .   .   .   .   1
##               lmFit(~subgroup, weights = assays(object)$weights)
##                      coefficient    fit downfdr upfdr downp   upp
##                           <fctr> <fctr>   <int> <int> <int> <int>
##                   1:   Intercept  limma       0    24     0    24
##                   2:       E00.8  limma       1     2     1     4
##                   3:         E01  limma       3     2     5     3
##                   4:         E02  limma       2     2     3     4
##                   5:         E05  limma       1     2     2     2
##                   6:         E15  limma       8     9     8    10
##                   7:         E30  limma       7    10     8    10
##                   8:         M00  limma       6     7     6     7
##      Plot summary

read_rnaseq_counts(sfile)

The sfile/sfileby arguments in both importers allow to merge in sample data from a separate file by a specified column. This is especially useful in clinical datasets, such as the GSE161731 Covid-19 dataset, for which both a counts file as well as a sample file can be downloaded from GEO:

basedir <- file.path(tempdir(), 'datasets')
dir.create(basedir, showWarnings = FALSE, recursive = TRUE)
if (!dir.exists(file.path(basedir, 'GSE161731'))){
    sink <- GEOquery::getGEOSuppFiles("GSE161731", baseDir = basedir)
}
## Setting options('download.file.method.GEOquery'='auto')
## Setting options('GEOquery.inmemory.gpl'=FALSE)
object <- read_rnaseq_counts(
  file    = file.path(basedir, 'GSE161731/GSE161731_counts.csv.gz'), 
  sfile   = file.path(basedir, 'GSE161731/GSE161731_counts_key.csv.gz'), 
  by.y    = 'rna_id', 
  formula = ~ gender,
  plot    = TRUE)
##  Read /tmp/RtmpsgvrVb/datasets/GSE161731/GSE161731_counts.csv.gz
##      Merge sdata: /tmp/RtmpsgvrVb/datasets/GSE161731/GSE161731_counts_key.csv.gz
##      Preprocess
##          Keep 18974/60675 features: count >= 10 (~gender)
##          counts: add 0.5
##          cpm:    tmm scale libsizes
##              cpm
##          voom: ~gender
##          counts: rm 0.5
##               Add PCA
##     LinMod
##               Code gender
##                                level
##                     coefficient Female Male
##                       Intercept  1      .  
##                       Male      -1      1
##               lmFit(~gender, weights = assays(object)$weights)
##                      coefficient    fit downfdr upfdr downp   upp
##                           <fctr> <fctr>   <int> <int> <int> <int>
##                   1:   Intercept  limma    2990 14603  3005 14613
##                   2:        Male  limma      40    74  1207  1447
##      Plot summary
## Warning: ggrepel: 96 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps