Alexander Logan

Digital Forensics

Contents:


Image Acquisition

Gamma Correction

Cameras follow a linear relationship between actual and reported brightness, however our eyes follow a nonlinear relationship. Gamma correction is used to account for this - translates actual luminance to perceived luminance.

Imaging Pipeline

  1. World
  2. Lens
  3. CFA
  4. Imaging Sensor
  5. Post Processing
  6. Digital Image
  7. Compression

Colour Models

RGB

Y’UV

Chroma Subsampling

Comparing Images

Mean Squared Error (MSE)

For two images $X$ and $Y$, MSE is defined as follows:

Where $x_i$ and $y_i$ are the pixels of $X$ and $Y$, and $N$ indicates total number of pixels.

Both images must be of the same size to be able to compute the MSE.

Correlation Coefficient

Most popular method is Pearson’s r - linear correlation between two variables. It can be computed as:

Often $r^2$ is used instead of $r$.

Structural Similarity (SSIM)


Image Enhancement

We often need to enhance images for forensic applications. Enhancements include improving illumination, contrast, removing unwanted noise, and sharpening.

Enhancements in the pixel (spatial) domain involves working directly on pixel values.

Spatial Domain

Histograms

Given an image with $L$ grey levels, an image histogram $h(r_k)$ is a discrete function defined as:

Where $r_k$ denotes the $k$th grey level and $n_k$ notes the number of pixels with grey level $r_k$, and $k = 0, 1, \cdots, L-1$. A normalised histogram is a discrete function which defines a probability distribution:

.

Contrast of an image can be enhanced using Histogram Equalisation. A high contrast image ideally should have a flat histogram spanning the entire range of intensity.

This idea can be generalised - it is possible to transform a histogram to any other histogram using cumulative distributions. This is called histogram matching. Outline for implementation is:

  1. Given two images, compute normalised histograms for both.
  2. Compute CDF for both.
  3. Replace each intensity level $x_i$ in input image to $x_j$:
    1. Find cumulative value for that pixel intensity in input CDF.
    2. Find corresponding pixel intensity for the same cumulative value in the reference CDF.
    3. Replace pixel intensity with this corresponding intensity.

Noise Removal

Frequency Domain

Enhancement can also be done in the Frequency Domain.

Fourier transform - Any function can be expressed in terms of sinusoids of varying frequency. For images, we can compute a 2D discrete Fourier transform.

Noise Removal in Frequency Domain

Discrete Cosine Transform (DCT)

DCT is similar to DFT, but DCT uses weighted sums of cosines - and therefore is a real rather than complex transformation.

The (0, 0) component of a DCT is called the DC component, and controls the overall brightness of the image.

DCT is popular for compression, as high frequency components can be removed without visually altering the appearance of the image too much.


Digital Watermarking

Digital watermarking is a technique which can be used to help determine if an image is authentic. It is an active approach (as it involves inserting a signature pattern into data before it is distributed).

There are several possible application areas:

Types of Watermarks

Bitplane Substitution

Selecting Embedding Locations

To create a visible watermark, we can substitute the more significant bit planes of the watermark to the LSB plane of the cover.

Watermarking by biplane substitution is:

Additive Watermark

Watermarking in Frequency Domain

Spread Spectrum Encoding

  1. Compute the 2D DCT of image.
  2. Identify $n$ largest coefficients, except the DC component to construct the vector $h$.
  3. Create the watermark as a vector $w$ whose components are sampled from a Gaussian distribution.
  4. Change the $n$ largest coefficients identified in Step 2 by $h_i^* = h_i (1 + \alpha w_i)$.
  5. Compute inverse 2D DCT using the modified spectrogram to get the watermarked image.

Spread Spectrum Decoding

Requires access to the original un-watermarked image. Compute the watermark as follows:

Robustness of Frequency Domain Watermarks

Watermarks in the frequency domain are known to be more robust to common watermarking attacks:

Hybrid Watermarking

One simple technique involves replacing the DCT components at (2, 3) and (4, 1) on a block-by-block basis, as these components have been shown to have roughly equally visual importance.

  1. Divide image into blocks.
  2. For each block take 2D DCT.
  3. Choose two locations of equal perceptual importance (for example (2, 3) and (4, 1)), then take (2, 3) > (4, 1) to imply 0, otherwise 1 etc.
    1. Flip values if not as desired, as visual quality will be only very slightly affected by this.

Comparing Watermarks

Attacks on Watermarks


JPEG Compression

JPEG is the most common image compression algorithm. It is a lossy algorithm - every time you compress and image using JPEG, some information loss will occur.

The steps involved in the algorithm are:

  1. Colour space conversion
    • Converted to YCbCr - we want to separate the intensity and colour components so they can be compressed differently.
    • Cb and Cr components are subsampled by a factor of 2, though this is skipped when the highest quality is required.
  2. Division into sub-images
    • Image is divided into non-overlapping blocks of size 8x8 or 16x16.
    • Encoding is done on each block independently (each of these being called macro-blocks or minimum encoded unit).
    • Helps ensure better compression when DCT is applied without high information loss.
  3. Discrete Cosine Transform
    • Compute DCT for each block.
  4. Quantiser
    • Main lossy step in JPEG. Each coefficient is quantised by a predefined factor.
    • Many-to-one mapping, i.e. multiple input coefficients get mapped to a single output value.
    • $F_Q(u, v) = \text{round}\left( \frac{F(u, v)}{Q(u, v)} \right)$
    • $Q(u, v)$ is the quantisation matrix. Quantisation factors are typically larger in the high frequencies.
    • Each channel will be quantised by a predefined quantisation matrix. Values are typically larger for the chroma channels.
    • Designed to roughly model the sensitivity of the human visual system.
  5. Entropy coding
    • Huffman coding is used to reduce amount of bits required to represent each block.
    • Order of blocks are stored starting with the low frequency components, and ending with the high frequency components.
      • This is so 0 values which will often occur in the high frequency components will be bunched together.

Huffman Coding

Should know how to do this, notes will be made later (from JPEG compression lecture).

JPEG Header

A JPEG header contains the following information:

The header can be used as a forensic tool - up to 576 values can be extracted to form a signature of the image. Any manipulation will alter this signature, and so can be detected. Signatures can be compared to those stored for known authentic cameras.


Compression-Based Forensics

Double Compression

The number of original histogram bins $n(v)$ contributing to bin $v$ in the double quantised histogram is given by:

Where $b$ is the first quantisation factor, and $a$ is the second quantisation factor. This is a periodic function, with period $b$.

Limitations

JPEG Ghosts

Quantisation Error

Therefore, periodically recompressing an image and looking for drops in the SSD between original and compressed versions will identify the quality levels the image (or part of the image) has been previously compressed at. By examining the SSDs across the image, it is possible to identify the region which may have been compressed multiple times.

Note that the physical properties of regions in an image could affect the forensic analysis with this technique. Differences can be spatially averaged and normalised to help identify the specific regions affected.


Copy-Move Forgery

Circular Shift & Match

Circular shift and match is an algorithm to more efficiently detect cloned regions in an image.

  1. Initialise a binary image A with all 0s.
  2. Initialise $k, l$ to 0.
  3. Until $k_{max}, l_{max}$…
    1. Compute S from input image I by using a circular shift of $(k, l)$.
    2. Compute binary image $D = |I-S| < t$ for some threshold $t$.
    3. Erode and dilate $D$ using a structure of size $b\times b$ for some $b$ to create $D_{ed}$.
    4. Update $A$ as $A = A + D_{ed}$.
    5. Increment $k, l$.

Feature Matching

Features

Properties of a good feature are:

Local Binary Pattern

We can then compute a LBP histogram. This can be global or for smaller regions.

Codes can be further ‘compressed’ by considering Uniform LBPs, which are those which have at most 2 transitions. Most image pixels get encoded into a uniform LBP.

Histogram of Oriented Gradients

Harris Corner Detector

This is a popular key-point based feature detector.

The basic idea is we are looking for regions which change in both the horizontal and vertical directions - these are likely to be corners. Look at error between two patches shifted by $(u,v)$:

Edges and corners will yield a large $E(u,v)$. Using first-order Taylor series expansion, we can represent this as a matrix equation (same matrix as in Robotics), and the corner response $R$ is defined as $R = \text{det}(M) - k(\text{trace}(M))^2$, where $k$ is some constant.

For each pixel, the image gradient can be approximated using kernel convolution.

We can then threshold by some $t$ to identify features only with large enough $R$ values.

Feature Matching


Sensor Forensics

Sensor Noise can be split into two broad categories: Shot Noise and Sensor Pattern Noise.

We want to use PRNU for identification.

We can model the image acquisition process as:

Ignoring shot and random noise, we can therefore write the PRNU-corrected sensor output, $\hat{x_{ij}}$, as:

$\hat{f_{ij}}$ is an approximation of PRNU, and can be obtained by averaging images before processing of a uniformly lit scene taken by the target sensor.

Approximating Reference SPN

  1. Ignore FPN.
  2. For a series of images taken by the sensor…
  3. Suppress scene content by demonising (possibly using averaging).
  4. Subtract de-noised version from original image.
  5. Average noise residuals.

Comparing Reference and Test SPN


Video Forensics

Video Encoding

We assign some different types of frames:

P-Frame Prediction

Each colour channel is encoded independently (as with images), and each sequence is divided into Group of Pictures (GOPs). An example arrangement could be: [I B B P B B P].

Frames are typically compressed in a periodic sequence of GOP, and GOPs can be parameterised by the number of frames, $N$, and the spacing of the P-frames, $M$.

When decoding:

Frame Deletion Forgery

This pattern of motion error is easy to detect as the motion errors are explicitly encoded. We can extract the residual frames from the compressed video stream and compute the mean prediction error for the entire frame. Periodic spikes in prediction errors indicate tampering.

To spot this, we can use a Fourier transform of the mean prediction error and look for peaks arising from the periodic variations.

Re-Projection Forgery

There is much more detail of the maths in the slides, but this appears to be the general overview of the technique.