Computational Biology

October 10, 2020

Contents:

Todo

~~Metric and additive distance matrices.~~
CLUSTALW.
BLAST algorithm.
You should feel comfortable formulating ODE network from diagram of gene regulatory network.
Gillespie’s Method.
- Draw 2 random numbers, one for time of next event, other for which event occurs.
Smoluchovski Equation and Diffusion.
Michaelis-Menten (MM) Kinetics.
Numerical details of ODE solvers are not required, but need to know what ‘stiff’ equations are and the problems they cause.
Metropolic MCMC.
Cell tracking inverter microscopy will be relevant.

Alignment

NWA vs SWA

NWA computes a global optimal alignment.
SWA computes a local optimal alignment.
- That is, locally which part of the sequences are most locally aligned.

Suffix Trees

Tree constructed by taking every suffix of input string (starting with whole string), and edges become characters in string.
Allows substring checking in time proportional to the length of a (smaller) string to be checked.

Neighbour Joining Method

See existing notes Lecture4. Also could be useful to look at implementation in code for coursework.

Modelling Single-Cell Dynamics

Computational Pathology

Slide Staining

Cells and other extracellular material making up most tissue are colourless. Therefore, staining is used to provide contrast.

Hematoxylin and Eosin (H&E) are the most commonly used stains.
- But, H&E staining makes it difficult to distinguish between different kinds of nuclei, or to assess the expression of various proteins/genes.
Immunihistochemical (IHC) stains are used for this reason.

Slide Scanner Considerations

Modern scanners are good, but the scanning process can still result in images which are out of focus in some areas.

Blur Estimation - Principle is in-focus images have clearly defined edges, out-of-focus images have no hard edges.
Can use the ‘blur effect’ method.
- Artificially blur the image.
- Compare the intensity of each pixel with its neighbours before and after blurring the image to form blur measure.

JPEG2000

Process is:

Image Tiling.
Wavelet Transform.
Quantisation.
Entropy Encoding.

To decode, process is reversed.

Wavelets are similar to Laplacian pyramids - signal can be represented as a sum of an approximation at coarse resolution with added detail.

Advantages over classic FT are:

Information about both time and frequency.
Inherent multi resolution nature, which is attractive in many signal and image processing applications.
Sparse representation.
Plausible for analysis of non stationary signals.

Tiles are coded such that each tile can be decoded separately.

Entropy Filtering

Entropy is a measure of disorder.

Entropy can be considered as a measure of activity (variation in pixel intensities).
Using sliding windows, can compute the local entropy of pixel intensities.

Otsu Thresholding

Assuming a bimodal distribution, we aim to find a threshold which separates tissue from the background.
Optimal threshold is the one which maximises the inter-class variance.

Beer-Lambert Law

Beer’s Law - Light absorbed is proportional to concentrations of the attenuating species in the material sample.
Lambert’s Law - Light absorbed by a material sample is proportional to its thickness.

Stain Separation/Deconvolution - Ruifrok-Johston (RJ) Method

The RGB values of image intensity cannot directly be used for stain measurement due to a nonlinear relationship between them and the stain concentration.

So, we define Optical Density (OD) as:

$d = \alpha c x = -\log_{10} (I / I_0)$

And stain matrix is defined in slide 35 of revision lecture slides.

The observed optical density is a linear combination of the stain concentrations present in the sample.

Advantages: Simple and cheap to calculate.

Stain Vector Estimation

Manual computation of stain vectors could be tedious and inefficient, and assumes we know the image stains beforehand.

Alternate strategy is to use a machine learning framework - assigning probabilities of belonging to H & E to each pixel.

Stain Variability

Variability results from methods and protocols used to prepare the specimen.

Type and duration of fixation.
Consistency and thickness of sections.
Room or specimen/dye temperatures.
Stain reactivity for different manufacturers/batches.
Incubation times.
Scanner equipment/settings.

Stain Normalisation

Involves ‘normalising’ the stain colour distribution according to some reference image.

Histogram Matching

Idea is to change the histogram of the source image to that of the reference image.

Calculate source and reference image normalised histograms.
Change source histogram such that it matches (approximately) the target image histogram.
Repeat for each colour channel.

Colour Transfer

Convert both source and reference images to an uncorrelated colour space (e.g. Lab).
Subtract the mean of the source image from all data points of the source image.
Scale all data points of the source image by standard deviation of the source and reference images.
Add mean of the reference image to each data point of the source image.
Repeat steps 2-4 for each channel.
Convert result back to RGB space.

PCA in the Optical Density Space

RGB to linear space (Optical density space, OD).
PCA on the OD channels.
Project data points on the top 2 principal components and unit normalise.
Calculate angle of each point in the projection with the 1st principal component.
Find the robust extremes (1st and 99th percentile) of angles as they represent the stain colours.
Convert robust extremes back to OD space.

Warwick-Leeds Stain Normalisation

For overview see revision slides 46 and 47.

Basic Thresholding Cell Detection

Can be achieved by Otsu thresholding on the H channel.
Can also use thresholding on the so-called Blue Ratio Image $BR = 100(\frac{B}{1+R+G})(\frac{256}{1+R+G+B})$.

Otsu Thresholding

Assuming a bimodal distribution, we aim to find a threshold that separates nuclei from the background.
In other words, an optimal threshold is the one that maximises the inter-class variance.
Does not take into account spatial info.

Handcrafted Features vs Deep Learning

With handcrafted features, typically each step needs to be designed and trained separately. Potentially more difficult and time consuming.

Several options available:

Shape features.
Textural features.
Frequency domain features.

With Deep Learning, no need to separately design and train, but requires large amounts of data.

IHC Staining

IHC markers may be expressed in particular cell organelles, such as the nucleus, cell membrane, cytoplasm, or a combination thereof.
Major Advantage of IHC Staining - Allows the observer a greater insight into the chemical makeup of tissue.
- Composition of tumour/immune cells.
- Used directly and indirectly for determining the degree of malignancy.
IHC scoring plays an important role in diagnosis and treatment of some particular types of cancer.
Automated quantification is fast, objective, and reproducible.
Stain separation is first step in IHC scoring.
Even though the DAB stain does not follow Beer-Lambert law, several studies have shown automated IHC quantification gives good concordance with IHC scoring by human experts.

Nuclear Marker Scoring

Needed when the protein of interest is expressed in the nuclei.
Some of the most commonly used IHC markers are nuclear markers.

Details of ER/PR, H, and AllRed scoring are in slides 65-67 of revision lecture.

Membranous Marker Scoring

Considered to be more difficult than nuclear marker scoring.
Human epidermal growth factor receptor 2 (Her2) over expression in nearly 15% of early invasive Bra patients.
Her2 scoring used as predictive test for targeted anti-HER2 therapy such as Herceptin.

Machine Learning

Slides start at 70 in revision lecture.

Data Augmentation

Used for increasing the amount of data.
Common methods include:
- Mirroring.
- Random cropping.
- Rotations.
- Warping.
- Addition of noise.
- Colour changes.

Segmentation

The task of dividing an image into various components by assigning semantic labels to individual pixels based on their regional characteristics.

In case of computational pathology, usually involves segmentation into:

Low-Level Segments - Nuclei.
Mid-Level Segments - Multi-cellular objects, e.g. glands, vessels, nerves.
High-Level Segments - Large more-or-less homogeneous areas (e.g. tumour-rich areas).