Alexander Logan

Mobile Robotics

Contents

  1. State Estimation
    1. Bayesian State Estimation
    2. Discrete Filter
    3. Linear Gaussian Filters
  2. Mapping
  3. Motion Planning
  4. Sensors
  5. Vision

State Estimation

We use probabilistic state estimation because the world itself is uncertain.

Perhaps the most important formula for this is Bayes Theorem:

In a lot of cases, obtaining $P(y | x)$ is much easier than $P(x | y)$.

In practice, the denominator (the evidence) can be obtained through some summation and normalisation. Therefore, we can think of the formula as $P(x|y) = \eta P(y|x)P(x)$, where $\eta$ represents this normalised summation.

Bayesian State Estimation

Suppose a robot makes an observation $z$ when trying to detect if a door is open. We begin with the robot not knowing any prior information about the state of the door - the door has equal probability of being open or closed.

What we’re after is the probability the door is open given our sensor measurement, $P(open | z)$. This cannot be directly measured, however.

Now suppose that we know if the door is open, then we would have had a probability of 0.6 of observing our measurement $z$. That is, $P(z | open) = 0.6$. Similarly if the door was not open, suppose we know the probability we would have made this measurement is 0.3, so $P(z | \lnot open) = 0.3$. Note that these probabilities do not necessarily need to sum to 1.

We can now apply Bayes theorem to obtain $P(open | z)$. The denominator represents the probability we have obtained the measurement $z$ ($P(z)$), so we enumerate each possibility leading to this result (in this case the door either being open or not open), and sum these probabilities as they are independent.

So:

Now substituting in our values, remembering our prior belief is that $P(open) = 0.5$:

Notice this probability is different from the 0.5 we previously assumed was the chance the door was open - by observing this measurement we have changed our belief about the state of the door.

Subsequent Measurements

Bayesian State Estimation relies on an important assumption - the Markov Assumption. This assumes that observed sensor measurements are independent of all their previous measurements, given some condition $x$. In the real world, this sometimes isn’t true, but as long as it is approximately correct it is ok in most cases.

In the Recursive Bayes method, we can then take the updated belief following our first observation, $P(open | z_1)$, and use it as the prior for subsequent measurements. Exactly the same method is used as above, and we end up obtaining $P(open | z_1, z_2)$. This can continue for as many measurements are required.

In practice, this should be implemented using dynamic programming. Following each iteration, the only information we are required to store is the updated belief, and this can then be directly used in the next iteration without having to recursively calculate all the priorities from scratch.

Bayes Filter

In a Bayes filter, we take the following inputs:

We then output the posterior of $x$, that is the probability we are in any given state $x_t$ with respect to all the previous information.

Derivation

We’re after finding the belief for a given state given all previous observations and actions:

The first thing we do is apply Bayes rule - we want this in terms of the probabilities of observations given actual state. Simplifying this with a normalising factor $\eta$ we obtain:

That is, we’re now looking at the probability of obtaining a given sensor value given our current belief about the state (which also depends on all previous observations). But, the Markov assumption applies here, so in fact the sensor observation does not need to depend on all previous observations - only the current state - so this can be simplified to:

Now, the right hand term (the prior) represents the probability of transitioning to state $x_t$, given that we were in $x_{t-1}$ and performed the previous sequence of actions and made the previous sensor observations. Similar to the door example above, we find this by considering the probability we would transition to $x_t$, given every possible previous configuration $x_{t-1}$. In the door example, that would be asking ‘what is the probability the door is now open, based on the probabilities of obtaining our previous sensor measurements given every possible state the door could have been in’. In that example, we could represent that with a summation, but in general this is an integral:

We can make a further simplification due to the Markov assumption - we assume our state only depends on the previous state and whatever action is performed, so the first term after the integral can be reduced:

And now finally notice the second term in the integral - it is precisely our definition for $Bel(x_{t-1})$! Therefore we can substitute this in and finally obtain the complete formulation:

We can see how this relates to the idea of taking some prior belief, making a prediction, and verifying this with measurements:

Discrete Filter

The main problem with the Bayesian state estimation formula is the integral, so we make assumptions to simplify things. In a discrete filter, we assume everything is discrete. Importantly, this means we assume the state space is finite.

The integral then becomes just enumerating all possible states - they can be represented by an $N \times N$ matrix where $N$ is the total number of states. We can then model the entire process using matrix multiplication.

In our matrices for sensor observation probabilities, each row will represent a given state the robot could be in, with the columns representing the probability the sensor will report a given state, if the robot was in the actual state given by the row. Therefore, rows should sum to 1, though columns do not need to.

When a sensor measurement is made, we construct a matrix to use in our calculation representing $P(z_t | x_t)$ by taking the column of the sensor observation matrix corresponding to this measurement and constructing a diagonal matrix with these values.

Similarly, our prediction step is represented by a matrix of possible state transitions, with each row representing a possible state and values in corresponding columns representing the probability of transitioning to those states.

The problem with discrete filters is that even for relatively small numbers of states, the computational complexity grows very quickly, and for a lot of problems modelling in a discrete way is not realistic.

Examples coming soon…

Linear Gaussian Filter

Linear Gaussian filters are the first step into continuous state estimation. In these, we assume all the components required by the Bayesian formula can be modelled with gaussians.

Gaussians have the useful property that applying linear functions or multiplying together gaussians always results in another gaussian, which is the property which allows us to do this.

Kalman Filter

The Kalman Filter assumes we have a:

This means, we can model the control using:

And we can model the sensor measurements using:

Where…

Initialisation

We assume the initial state can be modelled as a gaussian. This is required because otherwise are subsequent steps would not still result in gaussians.

Prediction

In the prediction step, we calculate the posterior mean and error covariance using:

Update

In the update step, we first calculate the Kalman gain $K_t$, using:

We then update the state estimate and error covariance:

More filters TODO…


Mapping

Occupancy Grid Maps

Assumptions

In this mapping scheme, we update each individual cell using a binary Bayes filter.

We use log odds to represent occupancy, calculated using:

Algorithm Outline

  1. For all cells $m_i$…
    1. If $m_i$ is in the perceptual field of $z_t$ (the sensor measurement) then…
      1. $l_{t,i} = l_{t-1, i} + \text{inverse_sensor_model}(m_i, x_t, z_t) - l_0$.
    2. Otherwise $l_{t_i}$ doesn’t change, i.e. $l_{t, i} = l_{t-1, i}$.
  2. Return occupancies ${l_{t,i}}$.

The inverse sensor model decides whether a cell is occupied or not given the sensor observations. It is called ‘inverse’ because it reasons from effects to causes, which is the opposite of the usual Bayes filters which reason from causes to effects. This can be learned using a supervised learning algorithm from the forward model.

SLAM

The SLAM problem involves trying to estimate the robot pose and map of the environment simultaneously.

Full SLAM attempts to estimate the entire path the robot has taken, that is $p(x_{0:t}, m | z_{1:t}, u_{1:t})$ - where $x$ is the robot position, $m$ is the map, $z$ are the sensor measurements, and $u$ are the control inputs.

Online SLAM only attempts to estimate the current pose, that is $p(x_t, m | z_{1:t}, u_{1:t})$. In practice, online SLAM is done recursively, similar to how recursive Bayes is implemented.

There are many considerations to make when developing a method for SLAM:

There are three main paradigms for SLAM:

EKF-SLAM

Don’t need to understand the mathematics behind this for the exam.

This method operates by using an EKF to estimate both the robot pose and the locations of the features in the environment. The state space can be:

With $n$ landmarks, the gaussian required is $(3+2n)$ dimensional.

As with an EKF, the operation of the algorithm follows the process of:

  1. State prediction.
  2. Measurement prediction.
  3. Obtain measurement.
  4. Data association.
  5. Update prediction.

Motion Planning

The problem of motion planning is defined as given:

we want to find a path that:

The motion planning problem is typically solved in another space, the configuration space. The configuration space is the set of all possible robot configurations. A robot configuration $q$ describes the positions of all robot points with a reference frame (the robot is not a point, so can be expressed as a vector of several positions and orientations).

Assuming a circular robot, we can obtain the C-space by ‘sliding’ the robot along the edges of all obstacles and recording the points we can reach. With a polygonal robot, the same is done, but with the sliding a combination of all orientations.

We can define the free space given a configuration space as the subset of the space such that no obstacles are present. Similarly, the obstacle space is the complement of free space.

Planning can then be done as if the robot was a point in free space.

As C-space is continuous, we want to discretise it to generate a ‘road map’ - a graph of free space where each vertex is a configuration in free space and each edge represents a collision-free path.

Combinatorial Methods

Visibility Graphs

Idea of this approach is to construct the road map by visibility through free space.

E.g. from each vertex of an obstacle in C-space, which other vertices of other obstacles can be seen?

Voronoi Diagram

Another approach is to construct a Voronoi diagram and use that as the map.

Can be constructed quickly, though the paths generated are often not optimal in open space.

Cell Decomposition

Involves splitting the free-space into non-overlapping cells (e.g. trapezoids).

This effectively approximates the edges of obstacles.

Sampling-Based Methods

Probabilistic Road Maps

Vertices are randomly sampled from C-space - if a point is in free space then we can add it as a vertex. Vertices close to each other are then connected, either using some local planning algorithm within a given radius, or a kNN algorithm.

The advantages of this approach are that it is not necessary to construct C-space and remains efficient with higher dimension C-spaces, however it can fail to compute paths through narrow passages and is not optimal.

Random Trees

Algorithm operates by aggressively probing and exploring C-space by incrementally expanding from some initial configuration. The explored territory is represented by a tree rooted at $q_0$.

In some cases, this can be done bi-directionally, growing two trees with one starting at the initial pose and one starting from the goal pose.

Searching

Searching is often done using A*, though of course any standard search algorithm can be used.

Importantly however, these assume the robot will be able to execute the planned path perfectly, which in practice is often not the case.


Sensors

Robots need sensor to measure parameters of the robot itself, and its environment.

The Sensing Process

The sensing process consists of five main steps:

  1. Transducer converts a physical quality into an electrical signal.
  2. Signal Conditioning - Want our signals to be linear, and may need to be amplified.
  3. Computer Interface.
  4. Data Reduction and Analysis - Data may be compressed if required and fused to generate a model of the world.
  5. Perception - Models are analysed to infer about the state of the robot surroundings.

Characterising Sensors

Sensors are noisy and often return an incomplete picture. They may also vary in performance and accuracy given the environment.

Some characteristics to consider are:

Resolution

Resolution is the minimum range between any two values which can be detected. For digital sensors, this is usually the resolution of the ADC.

Bandwidth or Frequency

Used to measure the speed with which a sensor can report readings - measured in Hertz.

Mobile robots are often limited in maximum speed by the bandwidth of their obstacle detection sensors.

Sensitivity

Measure of how much the target signal changes the output signal - ratio of output change to input change.

Cross-sensitivity is a measure of sensitivities orthogonal to the target parameter (e.g. using a compass to measure in an iron building).

Accuracy and Error

Defined as the degree of conformity between the sensor measurements and the true value.

The difference between the measured value and true value is known as the error.

There are two types of error:

Precision

Related to the reproducibility of the results:

where $\sigma$ is the standard deviation of the random error.

Sensor Classification

Sensors may be classified based on particular characteristics:

One example is Proprioceptive (Internal) vs Exteroceptive (External):

Another is Passive vs Active:

Common Sensors

GPS

For measuring position in the world, we can use GPS. The civilian version is accurate to about 10m.

At least 4 satellites can be seen from almost anywhere, and each transmits its position and the current time. Receivers can pick up these signals and determine their position using trilateration (uses time difference to measure distance from satellites).

Derivation

Consider the case of three satellites, with distances measured to each as $r_1, r_2$ and $r_3$. We assume that the first satellite is positioned at $(0, 0)$, the second is in the same horizontal plane as the first and so has position $(d, 0)$ for some $d$, and the third is at position $(i, j)$ where the $j$ direction is orthogonal to the plane created by connecting satellites 1 and 2.

Therefore, we have:

Now $r_1^2 - r_2^2 = x^2 - (x-d)^2$. This can be expanded to obtain $r_1^2 - r_2^2 = 2xd - d^2$, and so we are able to obtain:

Now we solve for $y$ using $r_3^2 - r_1^2$:

Finally, $z$ can simply be obtained from the equation for $r_1^2$ by:

In practice however, we usually use 4 satellites. This is because accurate positioning relies on accurate timing, and receiver clocks are much less accurate than the transmitting clocks. Therefore, you actually need to solve for $x$, $y$, $z$, and $t$ simultaneously.

Differential GPS can be used to improve regular GPS using information from land-based transmitters with exact known locations.


Vision

Robot vision is the automatic understanding of images and video.

The sensing process consists of first transducing (using cameras/sensors to detect the information), then measurement and modelling, leading finally to perception.

Often, image data is reduced in resolution (either by deleting pixels or reducing colour data) before processing. In some cases, binary images are used.

Structure from Stereo Vision

Given two image sensors, distance $b$ apart, with a focal length $f$, and target point $(x, y, z)$:

2D Filtering

Filtering of an image using a 2D mask involves sliding the mask over all the image and computing weighted sums to replace the centre pixels’s intensity value.

Smoothing

Line Detection

Edge Features

Canny Edge Detector

Involves:

Thick edges can be avoided by considering the second derivatives at points - edges indicated by zero crossings. This is usually noisier, but produces more localised results.

The second order derivative can be approximated using a discrete Laplacian matrix - e.g. $\begin{pmatrix}-1 & -1 & -1 \\ -1 & 8 & -1 \\ -1 & -1 & -1\end{pmatrix}$.

Interest Point Detection

For the ideal feature detector, we desire:

Corner Detection

There are two main approaches for this:

With corners, considering a small ‘window’, shifting in any direction should result in a large change in appearance in the window. We could define this ‘Cornerness’ as:

The Harris Corner Detector works by looking at the eigenvalues of a matrix of image gradients. If both eigenvectors are large, the region is classified as a corner. Non-maximal suppression can then be applied to these regions to identify single point corners.

The Harris Corner Detector is rotation invariant, partially intensity invariant, but not scale invariant.

SIFT

Scale Invariant Feature Transform. Operates by transforming data into scale-invariant coordinates.

SIFT has a number of advantages:

Briefly, it operates by:

  1. Generate ‘scale space’.
  2. Find extrema in the scale space - these are potential locations for finding features.
  3. Find the location of key points - accurately locating the feature ‘key’.
  4. Assign an orientation to each key point.
  5. Describe the key point using the SIFT descriptor.

The scale space is obtained by convolving the image with a variable scale gaussian. The Laplacian for all the gaussian blurred images is then computed.

Each pixel is compared with 26 pixels in the current and adjacent scales of the image, and is selected if it is larger or smaller than all of the other 26 pixels. The set of points is then refined by removing points with low contrast, and removing points on edges (using a similar approach to the Harris corner detector).

Orientation is assigned using a histogram of gradient directions in a neighbourhood around the key point.

The SIFT Descriptor is then obtained by extracting the 16x16 pixel region around the key point, breaking it into 16 smaller 4x4 windows, and then within each of these computing the gradient magnitudes and orientations, placing these into an 8-bin histogram. Finally, rotate the orientation to respect that of the key point, normalise to unit length, and threshold the feature vector.

Summary
  1. Construct Scale Space
  2. Take Difference of Gaussians
  3. Locate DoG Extrema
  4. Sub Pixel Locate Potential Feature Points
  5. Filter Edge and Low Contrast Responses
  6. Assign Key Points Orientations
  7. Build Key Point Descriptors
  8. Use the features elsewhere…