1 Introduction

Time Incorporated miR-mRNA Generation of Networks (TimiRGeN) is aimed at researchers who wish to explore interactions in time series microRNA-mRNA expression data. This package integrates, functionally analyses and generates small networks for hypothesis generation.

To achieve data reduction without reducing biological signal, the TimiRGeN package utilises several published packages and employs their functions in a synergistic fashion for time series multi-omic analysis. The following packages have been built upon for several functions in the TimiRGeN package:

rWikiPathways [1], clusterProfiler [2], DOSE [3], biomaRt [4], RCy3 [5], Mfuzz [6], igraph [7].

TimiRGeN is very selective and only uses miR-mRNA interaction data from databases curated within the last 2 years. To reduce the number of false-positives, TimiRGeN also only uses predictive databases which use seed site specificity as their main input.

TargetScans[8], miRDB[9], miRTarBase[10].

TimiRGeN does have the capability to generate networks in R, however this package is uniquely open ended, as the output can be easily be exported to cytoscape [11] or pathvisio [12] for better visualisation options.

TimiRGeN solely uses wikipathways for functional pathway analysis, and is the first tool to allow this for time series data. Wikipathways is a user curated pathway database that contains 1000s of mechanistic signalling pathways from multiple species [13]. Furthermore, wikipathways works very well with pathvisio which is our recommended tool for GRN (gene regulatory network) design. Please read the TimiRGeN/inst/Pathvisio_GRN_guide.pdf for a step-by-step tutorial for our GRN creation process.

The TimiRGeN package has several options for miR-mRNA analysis. Currently the package can analyse human or mouse data, perform analysis of miR and mRNA data combined or separately, and can use entrez or ensembl gene IDs. This is because most wikipathways are annotated with either entrez IDs or ensembl gene IDs. This tool can be best used after differential expression (DE) analysis, and has potential to become a staple part of any miR-mRNA expression data study.

1.1 Install TimiRGeN

if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
BiocManager::install("TimiRGeN")

1.2 Load Libraries

library(TimiRGeN)
library(org.Mm.eg.db)

TimiRGeN dependencies loaded with the package can be further investigated from these sources [1-7, 14-22].

Depending on the data to be analysed, please load either org.Mm.eg.db or org.Hs.eg.db before starting the analysis.

2 Combined miR-mRNA analysis

2.1 Example Mouse Kidney Fibrosis Dataset

In this section the combined method will be used to analyse a mouse kidney fibrosis data set. The mRNA data was published in Craciun et al (2016) [23] which was downloaded from GSE65267. The associated miR data was published in Pellegrini et al (2016) [24] and this was downloaded from GSE61328.

miR <- mm_miR

mRNA <- mm_mRNA

Notice the standard nomenclature used in the column names. Do follow the this standard for your own input data. The time point should come first and is followed by a .. The time point should consist of alphabetical characters followed by numerical characters e.g. D1, H6, TP3. After the . the column name should continue to display the specific result types from differential expression analysis.

Note. There should only be one . in each column name and no _ characters. Having more than one . or any _ characters will confuse some functions.

Note. There should be no NAs in your miR and mRNA data files.

2.2 Create MultiAssayExperiment object

MAE <- startObject(miR, mRNA)

TimiRGeN uses MultiAssayExperiment (MAE) to contain information. The dataframes and matrices will be stored as assays, S4 objects will be stored as Experiments and the lists will be stored as metadata.

If unfamiliar with MultiAssayExpriments please read through the vignette to understand how data can be accessed or go through the user guide which can be found on the MultiAssayExperiment bioconductor page [22].

2.3 Retrieve Gene IDs

MAE <- getIdsMirMouse(MAE = MAE, assay(MAE, 1))

MAE <- getIdsMrnaMouse(MAE = MAE, assay(MAE, 2), mirror = "useast")

Using getIds functions to produce dataframes containing entrezgene and ensembl ID annotations for genes. This is useful for downstream analysis.

Many wikipathways use either entrezgene IDs or ensembl gene IDs for annotation. Having both formats available can be useful.

Due to the nature of miRs, many NAs may be found in the output of getIdsMir functions. Entrezgene IDs and ensemble IDs are insensitive to miRs with -3p and -5p strands. Therefore, adjusted entrezgene IDs and ensemble IDs are also created.

Note. For getIdsMrna functions, if a connection time out error occurs or if downloads are very slow, try to use other mirrors e.g. mirror = "useast".

2.4 Filter Out Non-significant Genes

MAE <- combineGenes(MAE = MAE, miR_data = assay(MAE, 1), assay(MAE, 
    2))

MAE <- genesList(MAE = MAE, method = "c", genetic_data = assay(MAE, 
    9), timeString = "D")

MAE <- significantVals(MAE = MAE, method = "c", geneList = metadata(MAE)[[1]], 
    maxVal = 0.05, stringVal = "adjPVal")

mRNA and miR data can be combined using combineGenes function.

The genesList function will transform the large dataframe into multiple nested dataframes within a list. The data will be separated by the timeString parameter. In this example by D (days), because it was the non-numeric character before the . in the column names.

Significantly differentially expressed genes can be retrieved from each nested dataframe using the significantVals function. In this example. only genes which had an adjusted P value of less than 0.05 would remain in the list.

2.4.1 Find Significant Gene IDs

MAE <- addIds(MAE = MAE, method = "c", filtered_genelist = metadata(MAE)[[2]], 
    miR_IDs = assay(MAE, 3), mRNA_IDs = assay(MAE, 7))

MAE <- eNames(MAE = MAE, method = "c", gene_IDs = metadata(MAE)[[3]], 
    ID_Column = 4)

Now entrezgene IDs or ensembl IDs which were created before can be integrated into the filtered dataframes of genes using addIds. In this example entrezgene IDs were added.

Lists of entrez IDs/ ensembl IDs can be extracted for further analysis using the eNames function.

2.5 Time Dependent Pathway Enrichment Method

Once we have a list of significant genes per time point we can put this through gene set enrichment analysis to find enriched pathways in each time point in the data. TimiRGeN uses wikipathways [13] for GSEA.

MAE2 <- MultiAssayExperiment()

MAE2 <- dloadGmt(MAE = MAE2, speciesInitials = "Mm")

This is standard GSEA, here the enrichWiki function wraps around enrichment functions from DOSE and clusterProfiler [2,3] but applies these functions for time series analysis with wikipathways.

Note. Making multiple separate MAE objects makes it easier to work with all the generated data files.

MAE2 <- enrichWiki(MAE = MAE2, method = "c", ID_list = metadata(MAE)[[4]], 
    orgDB = org.Mm.eg.db, path_gene = assay(MAE2, 1), path_name = assay(MAE2, 
        2), ID = "ENTREZID", universe = assay(MAE2, 1)[[2]])

path_gene and path_name can be found as output from the dloadGmt function.

For a more stringent check, a unique universe can be used e.g. all possible genes found in a microarray or all known genes expressed in a cell type.

2.5.1 Plot GSEA

savePlots(largeList = metadata(MAE2)[[1]], maxInt = 5, quickType = quickDot, 
    fileType = "jpg")

To plot results from GSEA, the savePlots function can save all plots in the current working directory. Either bar plots or dot plots can be generated by using either quickbar or quickDot, and the plots can be saved to file in a variety of formats.