21 July | BioC 2015

Goals for this workshop

  • Learn how to read and write images in and out of R

  • Learn how images are represented in R and how to manipulate them

  • Understand how to apply filters and transformations to images

  • Apply these skills to microscopy images of cells to do segmentation and feature extraction

  • Explore spatial distributions of the position of cells

EBImage

Image processing and analysis toolbox for R

  • Reading and writing of image files
  • Interactive image viewer
  • Image manipulation, transformation and filtering
  • Object detection and feature extraction


Since Bioconductor 1.8 (2006)

Original developers:

Oleg Sklyar
Wolfgang Huber
Mike Smith
Gregoire Pau

Contributors:

Joseph Barry
Bernd Fischer
Ilia Kats
Philip A. Marais

Let's get started!

library(EBImage)

f = system.file("images", "sample.png", package="EBImage")
img = readImage(f)

display(img)

Reading and displaying images

Reading images

Images can be read from local files or URLs.

bioc = readImage("http://www.bioconductor.org/images/logo/jpg/bioconductor_logo_rgb.jpg")
display(bioc)
EBImage supports JPEG, PNG and TIFF file formats.
For reading proprietary microscopy image data and metadata use RBioFormats.

Displaying images

  • interactive JavaScript viewer
  • R's build-in plotting device

The default display method can be set by options(EBImage.display).

options(EBImage.display = "raster")

Adding text labels

display(img, method = "raster")
text(x = 20, y = 20, label = "Parrots", adj = c(0,1), col = "orange", cex = 2)
filename = "parrots.jpg"
dev.print(jpeg, filename = filename , width = dim(img)[1], height = dim(img)[2])
file.size(filename)
## [1] 37820

Writing images

Supported file formats: JPEG, PNG and TIFF.

writeImage(img, "sample.jpeg", quality = 85)

writeImage(img, "sample.tiff")
writeImage(img, "sample_compressed.tiff", compression = "deflate")

files = list.files(pattern = "sample*")
data.frame(row.names=files, size=file.size(files))
##                          size
## sample.jpeg             49688
## sample.tiff            393384
## sample_compressed.tiff 293788

Image representation

Multi-dimensional pixel intensity arrays 

  • (x, y)
  • (x, y, z) z-stack
  • (x, y, t) time-lapse
  • (x, y, c) channels
  • (x, y, c, z, t, …)

Image representation

str(img)
## Formal class 'Image' [package "EBImage"] with 2 slots
##   ..@ .Data    : num [1:768, 1:512] 0.447 0.451 0.463 0.455 0.463 ...
##   ..@ colormode: int 0
getClassDef("Image")
## Class "Image" [package "EBImage"]
## 
## Slots:
##                           
## Name:      .Data colormode
## Class:     array   integer
## 
## Extends: 
## Class "array", from data part
## Class "structure", by class "array", distance 2
## Class "vector", by class "array", distance 3, with explicit coerce
dim(img)
## [1] 768 512

Image summary

img
## Image 
##   colorMode    : Grayscale 
##   storage.mode : double 
##   dim          : 768 512 
##   frames.total : 1 
##   frames.render: 1 
## 
## imageData(object)[1:5,1:6]
##           [,1]      [,2]      [,3]      [,4]      [,5]      [,6]
## [1,] 0.4470588 0.4627451 0.4784314 0.4980392 0.5137255 0.5294118
## [2,] 0.4509804 0.4627451 0.4784314 0.4823529 0.5058824 0.5215686
## [3,] 0.4627451 0.4666667 0.4823529 0.4980392 0.5137255 0.5137255
## [4,] 0.4549020 0.4666667 0.4862745 0.4980392 0.5176471 0.5411765
## [5,] 0.4627451 0.4627451 0.4823529 0.4980392 0.5137255 0.5411765
imageData(img)[1:3, 1:6]
##           [,1]      [,2]      [,3]      [,4]      [,5]      [,6]
## [1,] 0.4470588 0.4627451 0.4784314 0.4980392 0.5137255 0.5294118
## [2,] 0.4509804 0.4627451 0.4784314 0.4823529 0.5058824 0.5215686
## [3,] 0.4627451 0.4666667 0.4823529 0.4980392 0.5137255 0.5137255

Image histogram

hist(img)
range(img)
## [1] 0 1

Color images

f = system.file("images", "sample-color.png", package="EBImage")
imgcol = readImage(f)
display(imgcol)
print(imgcol, short = TRUE)
## Image 
##   colorMode    : Color 
##   storage.mode : double 
##   dim          : 768 512 3 
##   frames.total : 3 
##   frames.render: 1

Image stacks

nuc = readImage(system.file("images", "nuclei.tif", package="EBImage"))
print(nuc, short = TRUE)
## Image 
##   colorMode    : Grayscale 
##   storage.mode : double 
##   dim          : 510 510 4 
##   frames.total : 4 
##   frames.render: 4
display(nuc)
## The image contains more than one frame: only the first one is displayed.
## To display all frames use 'all = TRUE'.

Image stacks

display(nuc, method = "raster", all = TRUE)

Manipulating images

Being numeric arrays, images can be conveniently manipulated by any of R's arithmetic operators.

Cropping

img = img[366:749, 58:441]

Inversion

img_neg = max(img) - img
display(img_neg)

Manipulating images

Brightness, contrast, and gamma correction

img_comb = combine(
  img,
  img + 0.3,
  img * 2,
  img ^ 0.5
)

display( tile(img_comb, numberOfFrames(img_comb), lwd = 20, fg.col = "white") )

Tresholding images

Binary images

Images which contain only two sets of pixels, which represent the background and the foreground pixels.

img_thresh = img > .5
display(img_thresh)

Tresholding images

Binary images

Images which contain only two sets of pixels, which represent the background and the foreground pixels.

img_thresh
## Image 
##   colorMode    : Grayscale 
##   storage.mode : logical 
##   dim          : 384 384 
##   frames.total : 1 
##   frames.render: 1 
## 
## imageData(object)[1:5,1:6]
##       [,1]  [,2]  [,3]  [,4]  [,5]  [,6]
## [1,] FALSE FALSE FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE FALSE FALSE
## [3,] FALSE FALSE FALSE FALSE FALSE FALSE
## [4,] FALSE FALSE FALSE FALSE FALSE FALSE
## [5,] FALSE FALSE FALSE FALSE FALSE FALSE

Spatial transformations

Translation

img_translate = translate(img, v = c(100, -50))
display(img_translate)

Spatial transformations

Rotation

img_rotate = rotate(img, angle = 30, bg.col = "white")
display(img_rotate)

Spatial transformations

Scaling

img_resize = resize(img, w = 512, h = 256)
display(img_resize)
img_resize = resize(img, 256)
display(img_resize)

Spatial transformations

Vertical and horizontal reflection

display( flip(img) )
display( flop(img) )

Affine transformation

Described by a 3x2 transformation matrix \(\textbf{m}\)

\[\begin{bmatrix} x'_1 & y'_1 \\ x'_2 & y'_2 \\ \vdots & \vdots \\ \end{bmatrix} = \begin{bmatrix} x_1 & y_1 & 1 \\ x_2 & y_2 & 1 \\ \vdots & \vdots & \ddots \\ \end{bmatrix} \times \textbf{m}\]
\[\textbf{m}_{\text{translation}}=\begin{bmatrix} 1 & 0 \\ 0 & 1 \\ t_x & t_y \\ \end{bmatrix},\quad \textbf{m}_{\text{rotation}}=\begin{bmatrix} \cos{\theta} & \sin{\theta} \\ -\sin{\theta} & \cos{\theta} \\ 0 & 0 \\ \end{bmatrix},\quad \textbf{m}_{\text{scaling}}=\begin{bmatrix} W & 0 \\ 0 & H \\ 0 & 0 \\ \end{bmatrix}\]

Affine transformation

Example: horizontal sheer mapping

angle = pi/6
m = matrix(c(1, -sin(angle), sin(angle)*dim(img)[2]/2, 0, 1, 0), nrow = 3, ncol = 2)
m
##      [,1] [,2]
## [1,]  1.0    0
## [2,] -0.5    1
## [3,] 96.0    0
img_affine = affine(img, m)
display(img_affine)

Image transposition

imgcol_t = transpose(imgcol)
display(imgcol_t)

Linear filters

Removing local artifacts or noise – average in a rectangular window

\[ f'(x,y) = \frac{1}{N} \sum_{s=-a}^{a}\sum_{t=-a}^{a} f(x+s, y+t),\] where \(f(x,y)\) is the value of the pixel at position \((x, y)\), \(a\) determines the window size, which is \(2a+1\) in each direction. \(N=(2a+1)^2\) is the number of pixels averaged over, and \(f'\) is the new, smoothed image.

Generalization: weighted average using a weight function \(w\)

\[ (w * f)(x,y) = \sum_{s=-\infty}^{+\infty} \sum_{t=-\infty}^{+\infty} w(s,t)\, f(x+s, y+s) \]

  • convolution of the images \(f\) and \(w\), indicated by \(*\)
  • linear: for any two images \(f_1\), \(f_2\) and numbers \(c_1\), \(c_2\) \[w*(c_1f_1+c_2f_2)=c_1w*f_1 + c_2w*f_2\]

Generating the weight function

The weight function can be generated using makeBrush.

w = makeBrush(size = 31, shape = 'gaussian', sigma = 5)
plot(w[(nrow(w)+1)/2, ], ylab = "w", xlab = "")

Other available brush shapes: "box" (default), "disc", "diamond", "line"

Low-pass filtering

2D linear convolution is implemented by filter2 (uses FFT)

Example: Smoothing an image using a Gaussian filter of width 5

img_flo = filter2(img, w)
display(img_flo)

High-pass filtering

Detecting edges and sharpening images

fhi = matrix(1, nrow = 3, ncol = 3)
fhi[2, 2] = -8
fhi
##      [,1] [,2] [,3]
## [1,]    1    1    1
## [2,]    1   -8    1
## [3,]    1    1    1
img_fhi = filter2(img, fhi)
display(img_fhi)

Adaptive tresholding

Compensate for spatial dependencies of the background signal

disc = makeBrush(21, "disc")
disc = disc / sum(disc)
nuc = getFrame(nuc, 1)
nuc_bg = filter2(nuc, disc)
offset = 0.02
nuc_thresh = (nuc - nuc_bg) > offset

img_comb = combine(nuc, nuc_bg, nuc_thresh)
display( tile(img_comb, numberOfFrames(img_comb), lwd = 20, fg.col = "white") )

Median filter

Non-linear noise removal technique

  • particularly effective in the case of speckle noise
  • preserves edges
l = length(img)
n = l/10
img_noisy = img
img_noisy[sample(l, n)] = runif(n, min = 0, max = 1)
img_median = medianFilter(img_noisy, size = 1)

display( combine(img_noisy, img_median), all=TRUE)
Implemented using the constant time algorithm (Perreault and Hebert 2007).

Morphological operations

Non-linear filtering of binary images

  • erosion: for every fg pixel, put the mask around it, and if any pixel under the mask is from the bg, set all these pixels to bg.
  • dilation: for every bg pixel, put the mask around it, and if any pixel under the mask is from the fg, set all these pixels to fg.
leaf = readImage( system.file('images', 'leaf.png', package='BioC2015Oles') )
kern = makeBrush(size = 3)
morph = combine(
  leaf,
  erode(leaf, kern),
  dilate(leaf, kern)
)

displayTiles(morph)

Morphological operations

  • opening: erosion followed by dilation
    removes small objects from the background
  • closing: dilation followed by erosion
    fills holes in the foreground
morph = combine(
  leaf,
  opening(leaf, kern),
  closing(leaf, kern)
)

displayTiles(morph) 

Application in cell biology

Fluorescent microscopy images from two channels of HeLa cells.

dna = readImage(system.file("images", "nuclei.tif", package="EBImage"))
print(dna, short=TRUE)
## Image 
##   colorMode    : Grayscale 
##   storage.mode : double 
##   dim          : 510 510 4 
##   frames.total : 4 
##   frames.render: 4
tub = readImage(system.file("images", "cells.tif", package="EBImage"))
print(tub, short=TRUE)
## Image 
##   colorMode    : Grayscale 
##   storage.mode : double 
##   dim          : 510 510 4 
##   frames.total : 4 
##   frames.render: 4

Application in cell biology

display( combine(getFrame(dna, 3), getFrame(tub, 3)), all=TRUE )
rgb = rgbImage(green = 1.5 * tub, blue = dna)
displayTiles(rgb)

Nuclei segmentation

  • Adaptive thresholding of the DNA channel
  • Morphological opening and filling of holes
  • Distance map computation and watershed transformation

Adaptive thresholding and morphological filtering

nmaskt = thresh(dna, w = 15, h = 15, offset = 0.05)
nmaskf = fillHull( opening(nmaskt, makeBrush(5, shape='disc')) )

display( combine(getFrame(nmaskt, 3), getFrame(nmaskf, 3)), all=TRUE )

Nuclei segmentation

Distance map computation

dmap = distmap(nmaskf)
range(dmap)
## [1]  0.00000 14.31782
display(normalize(dmap), frame = 3)

Nuclei segmentation

Watershed transformation

nmask = watershed(dmap, tolerance = 2)

display( combine(
  toRGB( getFrame(dna, 3) ), 
  colorLabels( getFrame(nmask, 3) )
), all=TRUE )

bwlabel - a simpler and faster function for segmenting connected objects

Cytoplasm segmentation

Identification of cell bodies by Voronoi partitioning using the segmented nuclei as seeds.

cmaskt = closing( gblur(tub, 1) > 0.105, makeBrush(5, shape='disc') )

cmask  = propagate(tub, seeds=nmask, mask=cmaskt, lambda = 0.001)

display( combine(
  toRGB( getFrame(cmaskt, 3) ), 
  colorLabels( getFrame(cmask, 3) )
), all=TRUE )

Voronoi tesselation

Voronoi diagram is a partitioning of a plane into regions based on distance to some specific points of that plane (seeds).

Voronoi tesselation on a Riemann manifold

  • arbitrary shape (provided in mask)
  • arbitrary metric (dependent on image gradient \(z\))

Distance between points \((x, y, z)\) and \((x+dx, y+dy, z+dz)\) \[ ds = \sqrt{ \frac{2}{\lambda+1} \left[ \lambda \left( dx^2 + dy^2 \right) + dz^2 \right] } \] \(\lambda\) controls the weight of the Euclidean distance term

  • when \(\lambda\) is large, \(ds\) tends to the Euclidean distance
  • for small \(\lambda\), \(ds\) tends to the intensity gradient of the image

Final segmentation

display( paintObjects(nmask,
            paintObjects(cmask, rgb, col = "magenta", thick = TRUE),
         col = "yellow", thick = TRUE), all = TRUE)

Individual cells stacked

st = stackObjects(cmask, rgb)

display(st, all = TRUE)

Feature extraction

Quantitative cellular descriptors

  • basic statistics on pixel intensities computeFeatures.basic
  • shape characteristics (area, perimeter, radius) computeFeatures.shape
  • moments (center of mass, eccentricity, …) computeFeatures.moment
  • Haralick textural features computeFeatures.haralick
head( computeFeatures.shape(cmask[,,1], tub[,,1]) )
##   s.area s.perimeter s.radius.mean s.radius.sd s.radius.min s.radius.max
## 1   4216         408      43.52530   14.254634    16.133188     73.48543
## 2   3079         263      34.60167    9.061043    19.146184     53.49633
## 3   3092         211      31.00933    4.636759    16.798016     38.91896
## 4   1091         161      21.16105    8.163347     7.496897     35.43516
## 5    355          78      10.65161    2.772033     4.568584     15.93175
## 6   2377         209      29.37053    8.030861    16.278131     44.85961

Automated cellular phenotyping

Using multivariate analysis methods we can:

  • detect cell subpopulations (clustering)
  • classify cells into pre-defined cell types or phenotypes (classification)
  • based on the frequencies of the subpopulations compare different biological conditions


Spatial statistics: point processes

Interaction between immune and cancer cells

Lymph nodes

  • immunologic filter composed of B cells, T cells, dendric cells (DC) and macrophages
  • clinical significance in cancer staging (the sentinel lymph node)

Breast cancer lymph node biopsy (Setiadi et al. 2010).

Cell typing was done by staining the cells with protein antibodies that provide specific signatures.

Lymph node biopsies

data(brcalymphnode, package = "BioC2015Oles")
head(brcalymphnode)
##     class    x     y
## 1 T_cells 6355 10382
## 2 T_cells 6356 10850
## 3 T_cells 6357 11070
## 4 T_cells 6357 11082
## 5 T_cells 6358 10600
## 6 T_cells 6361 10301
nrow(brcalymphnode)
## [1] 209462
table(brcalymphnode$class)
## 
##     T_cells       Tumor         DCs other_cells 
##      103681       27822         878       77081

Spatial distribution of cells

Plot of the x and y positions of the T cells (left) and tumor cells (right)

par(mfrow = c(1, 2), mar = c(4.5, 4.5, 0.5, 0.5))
xlim = range(brcalymphnode$x)
ylim = range(brcalymphnode$y)
cols = c(`T_cells` = "dodgerblue4", `Tumor` = "darkmagenta")
for(i in seq_along(cols))
  plot(subset(brcalymphnode, class==names(cols)[i])[, c("x", "y")], 
       pch = ".", asp = 1, xlim = xlim, ylim = ylim, col = cols[i])

Analysis of spatial point patterns

Marked spatial point process:

  • set of isolated points located in a mathematical space (xlim, ylim)
  • points marked with certain properties (factor class)
library("spatstat")

ln = with(brcalymphnode, 
  ppp(x = x, y = y, marks = class, xrange = xlim, yrange = ylim)
)

ln
## Marked planar point pattern: 209462 points
## Multitype, with levels = T_cells, Tumor, DCs, other_cells 
## window: rectangle = [3839, 17276] x [6713, 23006] units

Convex hull

chull = convexhull(ln)

par(mar = c(0, 0, 0, 0))
plot(Window(ln), main = NULL, asp = 1)
plot(chull, lty = 2, col = "lightblue", add = TRUE)
Window(ln) = chull
ln
## Marked planar point pattern: 209462 points
## Multitype, with levels = T_cells, Tumor, DCs, other_cells 
## window: polygonal boundary
## enclosing rectangle: [3839, 17276] x [6713, 23006] units

First order effects: the intensity

  • Number of points lying in a circle of area \(a\) around a point \(p=(x,y)\) \[N(p, a)\]

  • Local intensity \[\lambda(p) = \lim_{a\rightarrow 0} \frac{E[ N(p, a)]}{a}\]

  • For a stationary process, i.e., homogeneous all over the region \[\lambda(p) = \text{const.}\] Then the intensity in an area \(A\) is proportional to the area \[E(N(\cdot, A)) = \lambda A\]

Estimating the intensity

Kernel smoothed intensity estimate of a point pattern (Diggle 1985)
\[\hat{\lambda}(p) = \sum_i e(p_i) K(p-p_i)\] where \(K\) is the Gaussian smoothing kernel, and \(e(p_i)\) is an edge correction factor.
densities = solist( 
  `Diggle's edge correction` = density(subset(ln, marks=="Tumor"), diggle = TRUE),
  `No edge correction`       = density(subset(ln, marks=="Tumor"), edge = FALSE)
)
plot(densities, equal.ribbon = TRUE, col = topo.colors, main = "")

Probability of cell classes

Estimate of the spatially-varying probability of a particular event type

rr = relrisk(ln, sigma = 250)

plot(rr, equal.ribbon = TRUE, col = topo.colors, nrows = 1, main = "")

Second order effects: clustering

Spatial clustering (or anticlustering) of points: whether and to what extent neighboring points are more/less dense than expected by chance.

Poisson process

  • stationary with intensity \(\lambda\)
  • no dependency between occurrences of objects in non-overlapping regions
  • \(N(p, A)\) follows a Poisson distribution with rate \(\lambda A\)

The nearest neighbour (NN) distance distribution function \(G\)

Cumulative distribution function of the distance \(W\) from a typical random point to its nearest neighbor. For a Poisson process

\[G(r) = P(W \leq r) = 1-e^{-\lambda \pi r^2}\]

Estimation of the NN distance function G

gln = Gest(ln)
gln
## Function value object (class 'fv')
## for the function r -> G(r)
## .....................................................................
##         Math.label      Description                                  
## r       r               distance argument r                          
## theo    G[pois](r)      theoretical Poisson G(r)                     
## han     hat(G)[han](r)  Hanisch estimate of G(r)                     
## rs      hat(G)[bord](r) border corrected estimate of G(r)            
## km      hat(G)[km](r)   Kaplan-Meier estimate of G(r)                
## hazard  hat(h)[km](r)   Kaplan-Meier estimate of hazard function h(r)
## theohaz h[pois](r)      theoretical Poisson hazard function h(r)     
## .....................................................................
## Default plot formula:  .~r
## where "." stands for 'km', 'rs', 'han', 'theo'
## Recommended range of argument r: [0, 20.998]
## Available range of argument r: [0, 52.443]

Estimation of the NN distance function G

library(RColorBrewer)
par(mar = c(4.5, 4.5, 0.5, 0.5))

plot(gln, lty = 1, col = brewer.pal(4, "Set1"), main = "")
  • effect of finite cell size
  • clustering at large distances

Ripley's K-function

For a stationary point process \(\lambda K(r)\) is the expected number of additional points within a distance \(r\) of a given, randomly picked point.

Generalization to inhomogenous point processes

\[K_{\text{inhom}}(r) = \sum_{i,j} {\mathbf 1}_{d(p_i, p_j) \le r} \times \frac{e(p_i, p_j, r)} { \lambda(x_i) \lambda(x_j) },\]

where \(d(p_i, p_j)\) is the distance between points \(p_i\) and \(p_j\), and \(e(p_i, p_j, r)\) is an edge correction factor.

More useful for estimation and visualization is the \(L\)-function
\[L(r)=\sqrt{\frac{K(r)}{\pi}}.\]

For a Poisson point pattern, the theoretical value is \(L(r) = r\).

Estimate of the inhomogeneous L-function

Lln = Linhom( subset(ln, marks=="T_cells") )
Lln
## Function value object (class 'fv')
## for the function r -> L[inhom](r)
## ...........................................................................
##            Math.label                
## r          r                         
## theo       L[pois](r)                
## border     {hat(L)[inhom]^{bord}}(r) 
## bord.modif {hat(L)[inhom]^{bordm}}(r)
##            Description                                      
## r          distance argument r                              
## theo       theoretical Poisson L[inhom](r)                  
## border     border-corrected estimate of L[inhom](r)         
## bord.modif modified border-corrected estimate of L[inhom](r)
## ...........................................................................
## Default plot formula:  .~r
## where "." stands for 'bord.modif', 'border', 'theo'
## Recommended range of argument r: [0, 694.7]
## Available range of argument r: [0, 694.7]

Estimate of the inhomogeneous L-function

par(mar = c(4.5, 4.5, 0.5, 0.5))

plot(Lln, lty = 1, col = brewer.pal(3, "Set1"), main = "")

Pair correlation function

Describes how point density varies as a function of distance from a reference point.
\[g(r)=\frac{1}{2\pi r}\frac{dK}{dr}(r)\]

For a stationary Poisson process \(g(r) = 1\).

  • \(g(r) < 1\) – inhibition between points
  • \(g(r) > 1\) – clustering

Pair correlation function

pcfln = pcf( Kinhom(subset(ln, marks=="T_cells")) )

plot(pcfln, lty = 1, log = "x")

References

Diggle, P.J. 1985. “A Kernel Method for Smoothing Point Process Data.” Applied Statistics (Journal of the Royal Statistical Society, Series C) 34: 138–47.

Haralick, R. M., K. Shanmugam, and Its’Hak Deinstein. 1979. “Textural Features for Image Classification.” IEEE Transactions on Systems, Man and Cybernetics.

Jones, T., A. Carpenter, and P. Golland. 2005. “Voronoi-Based Segmentation of Cells on Image Manifolds.” CVBIA05, 535–43.

Pau, Gregoire, Florian Fuchs, Oleg Sklyar, Michael Boutros, and Wolfgang Huber. 2010. “EBImage – an R package for image processing with applications to cellular phenotypes.” Bioinformatics 26: 979–81.

Perreault, S., and P. Hebert. 2007. “Median Filtering in Constant Time.” IEEE Trans Image Process 16 (9): 2389–94.

Setiadi, A Francesca, Nelson C Ray, Holbrook E Kohrt, Adam Kapelner, Valeria Carcamo-Cavazos, Edina B Levic, Sina Yadegarynia, et al. 2010. “Quantitative, Architectural Analysis of Immune Cell Subsets in Tumor-Draining Lymph Nodes from Breast Cancer Patients and Healthy Lymph Nodes.” PLoS One 5 (8): e12420.