1 Workflow environment

systemPipeR workflows can be designed and built from start to finish with a single command, importing from an R Markdown file or stepwise in interactive mode from the R console.

This tutorial will demonstrate how to build the workflow in an interactive mode, appending each step. The workflow is constructed by connecting each step via appendStep method. Each SYSargsList instance contains instructions needed for processing a set of input files with a specific command-line or R software and the paths to the corresponding outfiles generated by a particular tool/step.

The workflow demonstrates a simple example of a typical data analysis workflow containing both R and command-line (CL) steps. The workflow consists of four steps:

  1. R step: export tabular data to files
  2. CL step: compress files
  3. CL step: uncompress files
  4. R step: import files and plot summary statistics

Refer to our vignette for how to construct the workflow steps. If you prefer a more enriched template, run systemPipeRdata::availableWF() for pre-configured templates.

cat(crayon::blue$bold("To use this workflow, following R packages are expected:\n"))
cat(c("'ggplot2'\n"), sep = "', '")
### pre-end
library(systemPipeR)
sal <- SPRproject()
sal

1.1 Load packages

appendStep(sal) <- LineWise(code = {
    library(systemPipeR)
}, step_name = "load_library")

1.2 Export dataset to file

  • Add first step as an R step using LineWise function
appendStep(sal) <- LineWise(code = {
    mapply(FUN = function(x, y) write.csv(x, y), x = split(iris,
        factor(iris$Species)), y = file.path("results", paste0(names(split(iris,
        factor(iris$Species))), ".csv")))
}, step_name = "export_iris", dependency = "load_library")

1.3 Compress data

  • Adding the second step, as a command-line step using SYSargsList
targetspath <- system.file("extdata/cwl/gunzip", "targets_gunzip.txt",
    package = "systemPipeR")
appendStep(sal) <- SYSargsList(step_name = "gzip", targets = targetspath,
    dir = TRUE, wf_file = "gunzip/workflow_gzip.cwl", input_file = "gunzip/gzip.yml",
    dir_path = "param/cwl", inputvars = c(FileName = "_FILE_PATH_",
        SampleName = "_SampleName_"), dependency = "export_iris")

1.4 Decompress data

  • Adding the third step, as a command-line step using SYSargsList
appendStep(sal) <- SYSargsList(step_name = "gunzip", targets = "gzip",
    dir = TRUE, wf_file = "gunzip/workflow_gunzip.cwl", input_file = "gunzip/gunzip.yml",
    dir_path = "param/cwl", inputvars = c(gzip_file = "_FILE_PATH_",
        SampleName = "_SampleName_"), rm_targets_col = "FileName",
    dependency = "gzip")

1.5 Import data back to R and perform statistical analysis and visualization

  • Adding the fourth step, as an R step using LineWise
appendStep(sal) <- LineWise(code = {
    # combine all files into one data frame
    df <- lapply(getColumn(sal, step = "gunzip", "outfiles"),
        function(x) read.delim(x, sep = ",")[-1])
    df <- do.call(rbind, df)
    # calculate mean and sd for each species
    stats <- data.frame(cbind(mean = apply(df[, 1:4], 2, mean),
        sd = apply(df[, 1:4], 2, sd)))
    stats$species <- rownames(stats)
    # plot
    plot <- ggplot2::ggplot(stats, ggplot2::aes(x = species,
        y = mean, fill = species)) + ggplot2::geom_bar(stat = "identity",
        color = "black", position = ggplot2::position_dodge()) +
        ggplot2::geom_errorbar(ggplot2::aes(ymin = mean - sd,
            ymax = mean + sd), width = 0.2, position = ggplot2::position_dodge(0.9))
    plot
}, step_name = "stats", dependency = "gunzip", run_step = "optional")

1.6 Version Information

appendStep(sal) <- LineWise(code = {
    sessionInfo()
}, step_name = "sessionInfo", dependency = "stats")

2 Manage the workflow

2.1 Interactive job submissions in a single machine

For running the workflow, runWF function will execute all the steps store in the workflow container. The execution will be on a single machine without submitting to a queuing system of a computer cluster.

sal <- runWF(sal)

2.2 Visualize workflow

systemPipeR workflows instances can be visualized with the plotWF function.

plotWF(sal, rstudio = TRUE)

2.3 Checking workflow status

To check the summary of the workflow, we can use:

sal
statusWF(sal)

2.4 Accessing logs report

systemPipeR compiles all the workflow execution logs in one central location, making it easier to check any standard output (stdout) or standard error (stderr) for any command-line tools used on the workflow or the R code stdout.

sal <- renderLogs(sal)

3 About the workflow

3.1 Tools used

To check command-line tools used in this workflow, use listCmdTools, and use listCmdModules to check if you have a modular system.

The following code will print out tools required in your custom SPR project in the report. In case you are running the workflow for the first time time and do not have a project yet, or you just want to browser this workflow, following code displays the tools required by default.

if (file.exists(file.path(".SPRproject", "SYSargsList.yml"))) {
    local({
        sal <- systemPipeR::SPRproject(resume = TRUE)
        systemPipeR::listCmdTools(sal)
        systemPipeR::listCmdModules(sal)
    })
} else {
    cat(crayon::blue$bold("Tools and modules required by this workflow are:\n"))
    cat(c("gzip", "gunzip"), sep = "\n")
}
## Tools and modules required by this workflow are:
## gzip
## gunzip

3.2 Session Info

This is the session information for rendering this report. To access the session information of workflow running, check HTML report of renderLogs.

sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets 
## [6] methods   base     
## 
## other attached packages:
## [1] BiocStyle_2.33.1
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.36       R6_2.5.1           
##  [3] codetools_0.2-20    bookdown_0.40      
##  [5] fastmap_1.2.0       xfun_0.46          
##  [7] cachem_1.1.0        knitr_1.48         
##  [9] htmltools_0.5.8.1   rmarkdown_2.27     
## [11] lifecycle_1.0.4     cli_3.6.3          
## [13] sass_0.4.9          jquerylib_0.1.4    
## [15] compiler_4.4.1      tools_4.4.1        
## [17] evaluate_0.24.0     bslib_0.8.0        
## [19] yaml_2.3.10         formatR_1.14       
## [21] BiocManager_1.30.23 crayon_1.5.3       
## [23] jsonlite_1.8.8      rlang_1.1.4

4 References