% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/MsBackendPython.R
\name{MsBackendPy}
\alias{MsBackendPy}
\alias{backendInitialize,MsBackendPy-method}
\alias{length,MsBackendPy-method}
\alias{spectraVariables,MsBackendPy-method}
\alias{spectraData,MsBackendPy-method}
\alias{spectraData<-,MsBackendPy-method}
\alias{peaksData,MsBackendPy-method}
\alias{peaksData<-,MsBackendPy-method}
\alias{$,MsBackendPy-method}
\alias{$<-,MsBackendPy-method}
\alias{intensity<-,MsBackendPy-method}
\alias{mz<-,MsBackendPy-method}
\alias{spectraVariableMapping<-,MsBackendPy-method}
\alias{spectraVariableMapping,MsBackendPy-method}
\alias{reindex}
\title{A MS data backend for MS data stored in Python}
\usage{
\S4method{backendInitialize}{MsBackendPy}(
  object,
  pythonVariableName = character(),
  spectraVariableMapping = defaultSpectraVariableMapping(),
  pythonLibrary = c("matchms", "spectrum_utils"),
  ...,
  data
)

\S4method{length}{MsBackendPy}(x)

\S4method{spectraVariables}{MsBackendPy}(object)

\S4method{spectraData}{MsBackendPy}(object, columns = spectraVariables(object), drop = FALSE)

\S4method{spectraData}{MsBackendPy}(object) <- value

\S4method{peaksData}{MsBackendPy}(object, columns = c("mz", "intensity"), drop = FALSE)

\S4method{peaksData}{MsBackendPy}(object) <- value

\S4method{$}{MsBackendPy}(x, name)

\S4method{$}{MsBackendPy}(x, name) <- value

\S4method{intensity}{MsBackendPy}(object) <- value

\S4method{mz}{MsBackendPy}(object) <- value

\S4method{spectraVariableMapping}{MsBackendPy}(object) <- value

\S4method{spectraVariableMapping}{MsBackendPy}(object, value)

reindex(object)
}
\arguments{
\item{object}{A \code{MsBackendPy} object.}

\item{pythonVariableName}{For \code{backendInitialize()}: \code{character(1)} with the
name of the variable/Python attribute that contains the list of
\code{matchms.Spectrum} objects with the MS data.}

\item{spectraVariableMapping}{For \code{backendInitialize()}: named \code{character}
with the mapping between spectra variable names and (\code{matchms.Spectrum})
metadata names. See \code{\link[=defaultSpectraVariableMapping]{defaultSpectraVariableMapping()}}, and the
description of the \code{backendInitialize()} function for \code{MsBackendPy}
for more information and details.}

\item{pythonLibrary}{For \code{backendInitialize()}: \code{character(1)} specifying
the Python library used to represent the MS data in Python. Can be
either \code{pythonLibrary = "matchms"} (the default) or
\code{pythonLibrary = "spectrum_utils"}.}

\item{...}{Additional parameters.}

\item{data}{For \code{backendInitialize()}: \code{DataFrame} with the full MS data
(peaks data and spectra data) such as extracted with the
\code{\link[Spectra:spectraData]{Spectra::spectraData()}} method on another \code{MsBackend} instance.
Importantly, the \code{DataFrame} must have columns \code{"mz"} and
\code{"intensity"} with the full MS data.}

\item{x}{A \code{MsBackendPy} object}

\item{columns}{For \code{spectraData()}: \code{character} with the names of
columns (spectra variables) to retrieve. Defaults to
\code{spectraVariables(object)}. For \code{peaksData()}: \code{character} with the
names of the peaks variables to retrieve.}

\item{drop}{For \code{spectraData()} and \code{peaksData()}: \code{logical(1)} whether,
when a single column is requested, the data should be returned as a
\code{vector} instead of a \code{data.frame} or \code{matrix}.}

\item{value}{Replacement value(s).}

\item{name}{For \code{$}: \code{character(1)} with the name of the variable to
retrieve.}
}
\value{
See description of individual functions for their return values.
}
\description{
The \code{MsBackendPy} allows to access MS data stored as \code{matchms.Spectrum}
or \code{spectrum_utils.spectrum.MsmsSpectrum} objects from the
\href{https://github.com/matchms/matchms}{\emph{matchms}} respectively
\href{https://github.com/bittremieux-lab/spectrum_utils}{\emph{spectrum_utils}} Python
library directly from R. The MS data (peaks data or spectra variables) are
translated on-the-fly when accessed. Thus, the \code{MsBackendPy} allows a
seamless integration of Python MS data structures into \code{\link[Spectra:Spectra]{Spectra::Spectra()}}
based analysis workflows.

The \code{MsBackendPy} object supports replacing values for peaks variables
(\emph{m/z} and intensity) and adding/replacing or removing spectra variables.
The changes are immediately translated and written back to the Python
variable.

See the descripion of the \code{backendInitialize()} method below for creation
and initialization of objects from this class. Also, the \code{setBackend()}
method for \code{\link[Spectra:Spectra]{Spectra::Spectra()}} objects internally uses
\code{backendInitialize()}, thus the same parameters can (and have) to be passed
if the backend of a \code{Spectra} object is changed to \code{MsBackendPy} using
the \code{setBackend()} method. Special care should also be given to parameter
\code{spectraVariableMapping}, that defines which spectra variables should be
considered/translated and how their names should or have to be converted
between R and Python. See the description for \code{backendInitialize()} and the
package vignette for details and examples.
}
\details{
The \code{MsBackendPy} keeps only a reference to the MS data in Python (i.e. the
name of the variable in Python) as well as an index pointing to the
individual spectra in Python but no other data. Any data requested from
the \code{MsBackendPy} is accessed and translated on-the-fly from the Python
variable. The \code{MsBackendPy} is thus an interface to the MS data, but not
a data container. All changes to the MS data in the Python variable
(performed e.g. in Python) immediately affect any \code{MsBackendPy} instances
pointing to this variable.

Special care must be taken if the MS data structure in Python is subset or
its order is changed (e.g. by another process). In that case it might be
needed to re-index the backend using the \code{reindex()} function:
\code{object <- reindex(object)}. This will update (replace) the index to the
individual spectra in Python which is stored within the backend.
}
\note{
As mentioned in the \emph{details} section the MS data is completely stored in
Python and the backend only references to this data through the name of
the variable in Python. Thus, each time MS data is requested from the
backend, it is retrieved in its \strong{current} state.
If for example data was transformed or metadata added or removed in the
Python object, it immediately affects the \code{Spectra}/backend.

Any replacement operation uses internally the \verb{spectraData()<-} method,
thus replacing/updating values for individual spectra variables or peaks
variables will first load the current data from Python to R, update or
replace the values and then store the full MS data again to the
referenced Python attribute.
}
\section{\code{MsBackendPy} methods}{


The \code{MsBackendPy} supports all methods defined by the \code{\link[Spectra:MsBackend]{Spectra::MsBackend()}}
interface for access to MS data. Details on the invidual functions can also
be found in the main documentation in the \emph{Spectra} package (i.e. for
\code{\link[Spectra:MsBackend]{Spectra::MsBackend()}}). Here we provide information for functions with
specific properties of the backend.
\itemize{
\item \code{backendInitialize()}: this method can be used to either initialize the
backend with data from a referenced and **existing ** MS data structure
in Python, or, through parameter \code{data}, first convert and store the
provided data to a Python MS data structure and then initialize the
backend pointing to this referenced variable (Python attribute). In both
cases, the name of the Python attribute needs to be provided with the
parameter \code{pythonVariableName}.
The mapping between the spectra variable names in R and the related
Python metadata variables can be specified with the
\code{spectraVariableMapping} parameter. It has to be a named \code{character} with
names being the spectra variables and the values the respective name for
the metadata in the Python MS data structure. It defaults to
\code{\link[=defaultSpectraVariableMapping]{defaultSpectraVariableMapping()}} which returns the mapping of some
core spectra variables for the \emph{matchms} Python library. Be aware that
only those spectra variables specified with this parameter are mapped and
translated between R and Python. For \code{backendInitialize()} with parameter
\code{data} provided, only the variables defined by \code{spectraVariableMapping},
and available in \code{data}, are converted and stored in Python. Also note
that, for efficiency reasons, core spectra variables (those listed by
\code{\link[Spectra:spectraData]{Spectra::coreSpectraVariables()}}) defined with \code{spectraVariableMapping}
but that have only missing values, are ignored.
Parameter \code{pythonLibrary} must be used to specify the Python library
representing the MS data in Python. It can be either
\code{pythonLibrary = "matchms"} (the default) or
\code{pythonLibrary = "spectrum_utils"}. The function returns an initialized
instance of \code{MsBackendPy}. See examples below for different settings
and conversion of spectra variables.
\item \code{intensity()}, \verb{intensity()<-}: get or replace the intensity values.
\code{intensity()} returns a \code{NumericList} of length equal to the number of
spectra with each element being the intensity values of the individual
mass peaks per spectrum. \verb{intensity()<-} takes the same list-like
structure as input parameter. Both the number of spectra and the number of
peaks must match the length of the spectra and the number of existing mass
peaks. To change the number of peaks use the \verb{peaksData()<-} method
instead that replaces the \emph{m/z} and intensity values at the same time.
Calling \verb{intensity()<-} will replace the full MS data (spectra variables
as well as peaks variables) of the associated Python variable.
\item \code{mz()}, \verb{mz()<-}: get or replace the \emph{m/z} values. \code{mz()} returns a
\code{NumericList} of length equal to the number of spectra with each element
being the \emph{m/z} values of the individual mass peaks per spectrum.
\verb{mz()<-} takes the same list-like structure as input parameter. Both the
number of spectra and the number of peaks must match the length of the
spectra and the number of existing mass peaks. To change the number of
peaks use the \verb{peaksData()<-} method instead that replaces the \emph{m/z} and
intensity values at the same time.
Calling \verb{mz()<-} will replace the full MS data (spectra variables
as well as peaks variables) of the associated Python variable.
\item \code{peaksData()}: extracts the peaks data matrices from the backend. Python
code is applied to the data structure in Python to
extract the \emph{m/z} and intensity values as a list of (numpy) arrays. These
are then translated into an R \code{list} of two-column \code{numeric} matrices.
Because Python does not allow to name columns of an array, an additional
loop in R is required to set the column names to \code{"mz"} and \code{"intensity"}.
\item \verb{peaksData()<-}: replaces the full peaks data (i.e., \emph{m/z} and intensity
values) for all spectra. Parameter \code{value} has to be a \code{list}-like
structure with each element being a \code{numeric} matrix with one column
(named \code{"mz"}) containing the spectrum's \emph{m/z} and one column (named
\code{"intensity"}) with the intensity values. This method will replace the
full data of the associated Python variable (i.e., both the spectra as
well as the peaks data).
\item \code{spectraData()}: extracts the spectra data from the backend. Which spectra
variables are translated and retrieved from the Python objects depends on
the backend's \code{spectraVariableMapping()}. All metadata names defined are
retrieved and added to the returned \code{DataFrame} (with eventually missing
\emph{core} spectra variables filled with \code{NA}).
\item \verb{spectraData()<-}: replaces the full spectra (+ peaks) data of the backend
with the values provided with the submitted \code{DataFrame}. The number of
rows of this \code{DataFrame} has to match the number of spectra of \code{object}
(i.e., being equal to \code{length(object)}) and the \code{DataFrame} must also
contain the spectras' \emph{m/z} and intensity values.
\item \code{spectraVariables()}: retrieves available spectra variables, which include
the names of all metadata attributes in the \code{matchms.Spectrum} objects
and the \emph{core} spectra variables \code{\link[Spectra:spectraData]{Spectra::coreSpectraVariables()}}.
\item \code{spectraVariableMapping()}: get the currently defined mapping for
\code{spectraVariables()} of the backend.
\item \verb{spectraVariableMapping<-}: replaces the \code{spectraVariableMapping} of the
backend (see \code{\link[=setSpectraVariableMapping]{setSpectraVariableMapping()}} for details and description
of the expected format).
\item \code{$}, \verb{$<-}: extract or add/replace values for a spectra variable from/in
the backend. Replacing or adding values for a spectra variable cause the
full data to be replaced. In detail, first the full data is retrieved from
Python, then the values are added/replaced and then the data is again
transferred to Python.
}
}

\section{Additional helper and utility functions}{

\itemize{
\item \code{reindex()}: update the internal \emph{index} to match \code{1:length(object)}.
This function is useful if the original data referenced by the backend was
subset or re-ordered by a different process (or a function in Python).
}
}

\examples{

## Loading an example MGF file provided by the SpectriPy package.
## As an alternative, the data could also be imported directly in Python
## using:
## import matchms
## from matchms.importing import load_from_mgf
## s_p = list(load_from_mgf(r.fl))
library(Spectra)
library(SpectriPy)
library(MsBackendMgf)

fl <- system.file("extdata", "mgf", "test.mgf", package = "SpectriPy")
s <- Spectra(fl, source = MsBackendMgf())
s

## Translating the MS data to Python and assigning it to a variable
## named "s_p" in the (*reticulate*'s) `py` Python environment. Assigning
## the variable to the Python environment has performance advantages, as
## any Python code applied to the MS data does not require any data
## conversions.
py_set_attr(py, "s_p", rspec_to_pyspec(s))

## Create a `MsBackendPy` representing an interface to the data in the
## "s_p" variable in Python:
be <- backendInitialize(MsBackendPy(), "s_p")
be

## Alternatively, by passing the full MS data with parameter `data`, the
## data is first converted to Python and the backend is initialized with
## that data. The `setBackend()` call from above internally uses this
## code to convert the data.
be <- backendInitialize(MsBackendPy(), "s_p3",
    data = spectraData(s, c(spectraVariables(s), "mz", "intensity")))

## Create a Spectra object which this backend:
s_2 <- Spectra(be)
s_2

## An easier way to change the data representation of a `Spectra` object
## from R to Python is to use the `Spectra`'s `setBackend()` method
## selecting a `MsBackendPy` as the target backend representation:
s_2 <- setBackend(s, MsBackendPy(), pythonVariableName = "s_p2")
s_2

## This moved the data from R to Python, storing it in a Python variable
## with the name `s_p2`. The resulting `s_2` is thus a `Spectra` object
## with all MS data however stored in Python.

## Note that by default only spectra variables that are part of
## `defaultSpectraVariableMapping()` are converted to Python
defaultSpectraVariableMapping()

## Thus, for example the precursor m/z is available in `s_2`, but other
## spectra variables from `s`, such as `"SMILES"` are not:
precursorMz(s)
precursorMz(s_2)

s$SMILES |> head()
## s_2$SMILES would throw an error.

## To also translate this spectra variable, it needs to be included and
## specified with the `spectraVariableMapping` parameter. The easiest
## approach is to use the `spectraVariableMapping()` function adding in
## addition to the default mapping for the Python library (`"matchms"`)
## also the mapping of additional spectra variables that should be converted:
s_2 <- setBackend(s, MsBackendPy(), pythonVariableName = "s_p2",
    spectraVariableMapping = spectraVariableMapping("matchms", c(SMILES = "smiles")))
s_2$SMILES |> head()

## Available spectra variables: these include, next to the *core* spectra
## variables, also the names of all metadata stored in the `matchms.Spectrum`
## objects.
spectraVariables(s_2)

## Get the full peaks data:
peaksData(s_2)

## Get the peaks from the first spectrum
peaksData(s_2)[[1L]]

## Get the full spectra data:
spectraData(s_2)

## Get the m/z values
mz(s_2)

## Plot the first spectrum
plotSpectra(s_2[1L])


########
## Using the spectrum_utils Python library

## Below we convert the data to a list of `MsmsSpectrum` object from the
## spectrum_utils library.
py_set_attr(py, "su_p", rspec_to_pyspec(s,
    spectraVariableMapping("spectrum_utils"), "spectrum_utils"))

## Create a MsBackendPy representing this data. Importantly, we need to
## specify the Python library using the `pythonLibrary` parameter and
## ideally also set the `spectraVariableMapping` to the one specific for
## that library.
be <- backendInitialize(MsBackendPy(), "su_p",
    spectraVariableMapping = spectraVariableMapping("spectrum_utils"),
    pythonLibrary = "spectrum_utils")
be

## Get the peaks data for the first 3 spectra
peaksData(be[1:3])

## Get the full spectraData
spectraData(be)

## Extract the precursor m/z
be$precursorMz
}
\author{
Johannes Rainer and the EuBIC hackathon team
}
