1 Introduction

bioassayR is a tool for cross-target analysis of biological screening data. It allows users to store, organize, and systematically analyze data from a large number of small molecule bioactivity experiments of heterogenous design. Users have the option of supplying their own bioactivity data for analysis, or downloading a database from the authors website (http://chemmine.ucr.edu/bioassayr) pre-loaded with bioactivity data sourced from NCBI PubChem BioAssay (Backman, Cao, and Girke 2011; Wang et al. 2012). The pre-loaded database contains the results of thousands of bioassay experiments, where small molecules were screened against a defined biological target. bioassayR allows users to leverage these data as a reference to identify small molecules active against a protein or organism of interest, identify target selective compounds that may be useful as drugs or chemical genomics probes, and identify and compare the activity profiles of small molecules.

The design of bioassayR is based around four distinct objects, each which is optimized for investigating bioactivity data in different ways. The bioassay object stores activity data for a single assay, and also acts as a gateway for importing new activity data, and editing the data for a single assay. The bioassayDB object uses a SQL database to store and query thousands of assays, in a time-efficient manner. Often users will wish to further investigate a set of compounds or assays identified with a bioassayDB query by pulling them out into a bioassaySet object. The bioassaySet object stores activity data as a compound vs assay matrix, and can be created from a list of assay ids or compounds of interest. Lastly, the perTargetMatrix is a matrix of compounds vs targets, where replicates (assays hitting the same target) from a bioassaySet object are summarized into a single value. This internally uses a sparse matrix to save system memory, while allowing the user to leverage R language matrix features to further investigate these data.