Sensor Networks & Mobile Data Communications

October 10, 2020

Contents

Radio Link Quality
Data Link Layer
Network Layer
Network Reprogramming
Failure Detection
Neighbourhood View Consistency

Radio Link Quality

Environmental factors may affect the quality of links - background noise, interference, and internal hardware noise etc.

Routing protocols require link quality estimation to overcome the problems of unreliability associated with low-power links. It is also important to help maintain a stable topology.

There are a few possible important metrics:

Packet Reception Ratio (PRR) - Ratio of the number of successfully received packets to the number of transmitted packets.
Packet Error Ratio (PER) - $1- PRR$.
Received Signal Strength Indicator (RSSI) - Usually stored in a hardware register, and gives the signal strength of a received packet. When no packets being received, provides the noise floor.
Signal to Noise Ratio (SNR) - Difference (in dB) between the pure signal strength (strength without noise) and the noise floor.
Link Quality Indicator (LQI) - Vendor-specific indicator of an evaluation of the link quality.

We usually make the simplifying assumption that transmission range is circular, though in practice this is often untrue. In practice, we can define three different types of region:

Connected Region - Links typically of good quality, stable, and symmetric. PRR approaches 100%.
Disconnected Region - Links low quality and inadequate for communication. Near or absolutely 0% PRR.
Transitional Region - Links of intermediate quality and unstable, not correlated with distance, and may be symmetric. Typical PPRs range from 10%-90%.

We make some important observations:

Link quality is not correlated with distance, especially in the transitional region.
The extent of the transitional region depends on the environment, and hardware characteristics.
- Some works suggest the transitional region may be 50-80% of the transmission range, though this is hardware dependent.
Link Quality is anisotropic.
- Issues with radio irregularities.
- A non-spherical interference range exists beyond the communication range.
- Several MAC protocols make this assumption, that if a node can interfere with another’s transmission, then it is in the communication range of that node.
  - This is invalid due to the interference range.
- Arises due to radiation patterns of antennas - gain is not the same in all directions.
The spatial variation of link quality is due to constructive/destructive interference.
- Depends on the nature of the physical path between the sender and receiver.
Links with very low or high PRRs are more stable than links with moderate/average PRRs.
Over short time spans, links may experience high temporal correlation in packets reception.
- Leads to short periods of either 0% PRR or 100% PRR.
Temporal variation of link quality is due to changes in the environment characteristics.
- E.g. changes in temperature, humidity, objects moving.

Link Asymmetry

This has a big impact on the performance of higher-level protocols.

Asymmetric links are mainly located at the transitional region.
Link asymmetry is not correlated with distance.
Link asymmetry may or may not be persistent.
Hardware asymmetry and radio irregularity constitute the major causes of link asymmetry.

Interference

Interference can be external or internal. External interference can occur from co-located networks operating at the same frequency. Internal interference occurs from concurrent transmissions from nodes in the same WSN.

Link Quality Estimation

There are three types of link monitoring:

Active
- Nodes send probe packets to measure the state of the link.
- Costly due to overhead.
- Can be improved using adaptive beaconing.
Passive
- Estimates link quality using existing traffic.
- Incurs overhead of probing idle links.
- Overhearing uses significant energy.
- Estimates may be less accurate compared to active.
Hybrid
- Balance using active and passive monitoring to trade off energy efficiency and accuracy.

Some observations about hardware link quality estimators:

RSSI can provide a quick and accurate estimate of whether a link is of very good quality (in the connected region).
LQI is not a good indicator of intermediate quality links due to its high variance.
Instead looking at the variance of multiple LQI readings can be used as a measure of link quality.
LQI is a better indicator of the PRR than RSSI.
SNR is a good indicator of the PRR, but not accurate for intermediate links.
SNR is better estimator than RSSI.

There are three classes of software link quality estimators:

PRR-based - Count or approximate the PRR.
RPN-based - Count or approximate the required number of retransmissions for each packet.
Score Based - Use some score to rate link quality.

Data Link Layer

The data link layer is responsible for accessing the medium - controls when a node should transmit data to another node or set of nodes.

Nodes in ad-hoc and WSNs usually transmit in half-duplex mode (only one direction at a time). However, more than one node may transmit data simultaneously.

There are several considerations to take into account at the MAC layer:

Synchronisation between receivers and transmitters using minimum overhead.
Avoidance of collisions and needless waiting.
- Hidden node problem.
- Exposed node problem.
Quality of Service (QoS) - guaranteeing resources for transmission.
Energy efficiency, especially for WSNs.
Increase throughput, reduce latency, again especially for WSNs.

There are three main concepts for dividing access to a medium:

Frequency Division Multiple Access (FDMA) - Assignment of frequencies to transmission channels according frequency division multiplexing.
Time Division Multiple Access (TDMA) - Assignment of time slots to transmission channels according to time division multiplexing.
Code Division Multiple Access (CDMA) - Assignments of codes to transmission channels according to code division multiplexing.

In Ad-Hoc networks, protocols generally fall under two categories: contention-based and reservation-based.

Contention-Based	Reservation-Based
Rely on controlled contention between nodes to access the medium.	Each node transmits data during a previously reserved time slot. Contention occurs only during the reservation phase.
✅ Flexible - Each node makes contention decisions independently. ❌ No QoS - Nodes are not guaranteed regular access to the channel. ❌ More collisions as node density increases.	✅ Collision free ✅ QoS ❌ Time Synchronisation Required - Reservations need to be consistent across all nodes.
CSMA/CA MACAW IEEE 802.11 DBTMA	D-PRMA HRMA

CSMA/CA

Carrier sense multiple access with collision avoidance. Based on carrier sense - a node listens to the medium continuously, and if it is found to be idle for a certain period of time (called an inter-frame space (IFS)), it may transmit.

If a node wanted to transmit but sensed the medium was busy, it must first wait for the medium to become free, and then wait an inter-frame space. It will then begin contention for the medium - it will generate a random backoff time and wait this long before attempting to transmit. This helps ensure nodes do not collide by all attempting to transmit at the end of the inter-frame space.

Note that this protocol alone does not solve the hidden node problem.

CSMA/CA with Signalling Packets

The protocol can be extended to support request-to-send and clear-to-send packets to reduce the problem of interference from hidden nodes.

Any node overhearing a RTS defers all transmissions at least until a CTS is received.
Any node overhearing a CTS defers all transmissions until the data and ACK are received.
ACK informs the sender that the data has been correctly received.

In this algorithm, back off is exponentially increased upon collisions occurring.

MACAW

Multiple Access with Collision Avoidance for Wireless networks. This is not based on carrier sensing, and instead uses signalling packets:

RTS/CTS
ACK
DS - Data Sending
RRTS - Request for request-to-send

The back-off using in this algorithm is multiplicative increase and linear decrease (MILD) rather than exponential. This helps reduce cases of long back-off times occurring.

IEEE 802.11

Standard for WLANs. Based on CSMA/CA with exponential back-off timer.

RTS includes duration of data transmission and ACK.
CTS copies information about duration of data transmission.
If ACK is not received, sender uses exponential back-off timer.
Slot time is 20µs - this is the time needed to detect the transmission of a packed from any other node.
Short Inter-Frame Spacing is 10µs.
Distributed Inter-Frame Spacing is 50µs.
- $DIFS = SIFS + 2 \times \text{slot time}$.
Contention window is a minimum of 31 slots, and maximum of 1023 slots.

Also uses Network Allocation Vector (NAV) mechanisms, which specify the earliest time a node may attempt transmissions. Nodes will defer access to the medium upon hearing a RTS or CTS, as the length of time required for the transmissions is contained within these packets.

Collisions may only occur when the RTS/CTS packets are being sent. Afterwards, collisions are avoided because nodes will have set their NAVs. This RTS/CTS system is sometimes called virtual sensing of the medium.

MAC in WSNs

When considering protocols to apply in WSNs, we should take into account the expected use cases of these networks:

Usually used to collect data and route this back to some sink.
Queries are often addressed to sets of nodes satisfying certain conditions.
May have long idle periods, and latency is tolerable.

IEEE 802.15.4

Standard for low-rate Wireless PANs (personal area networks).

This protocol combines contention-based and reservation-based schemes.

In this protocol, devices can either be considered Full Function Devices (FFD) or Reduced Function Devices (RFD).

Full Function Devices
- Coordinates a number of devices.
- Communicates with any other device.
- Can be arranged in any topology.
Reduced Function Devices
- Communicate only with a coordinator, which are always FFDs.
- Limited to star topologies.
- Cannot become coordinators.

The coordinators have several tasks:

Manage a list of their associated RFDs, which is done using signalling packets.
Allocate addresses for RFDs, using IEEE 64-bit address space.
Exchange data packets with devices and with other coordinators.
When in beacon-enabled mode, also process requests to reserve resources.

In non-beaconed mode, regular CSMA/CA with ACK (but no RTS/CTS) is used.

In beacon-enabled mode, super-frames are signalled with beacons and slotted CSMA/CA is used - in the first part of the super-frame, nodes contend for access to slots (the Contention Access Period (CAP)), and in the second part nodes which have been assigned guaranteed slot are able to transmit (the Guaranteed Time Slots (GTS)).

These guaranteed slots are managed as follows:

Suppose a node A wishes to obtain a guaranteed slot. During the CAP, A sends a request packet to the coordinator indicating the number of slots it requires, and whether they are for transmission or receiving.
Coordinator replies with an ACK - indicating it has received the request. This ACK does not guarantee the reservation has been made.
The coordinator considers the resources it has available and whether it can accommodate the request.
1. If it has enough resources, it allocates the slot(s) and informs all nodes of the allocation during one of the next beacons.
2. If it does not have enough resources, it indicates this in one of the next beacons.
Node A listens to beacons to see if its reservation has been accepted.
1. If no feedback is received within a given time period, it assumes the request has failed.
2. If the allocation has been accepted, A will see the resources and slot(s) it has been allocated in the beacon.
3. If the coordinator has indicated the resources are not available, A may attempt to renegotiate.

Even if slots have not been allocated to a node, beacons may indicate the coordinator has data to send to a node. In this case, the node in question may send a data request during the CAP, to which the coordinator will reply with an ACK to indicate it will send the data (and therefore the node should not sleep). Upon correct reception of the data, the node will reply with an ACK. If no ACK is received after sending the data request, the node may sleep and try again in the next super-frame.

In non-beaconed mode, this operation continues for the entire super-frame.

TDMA - Reservation Based MAC

Involves splitting the timeline into slots, and assigning these to nodes which will only transmit in their designated slot. But how do we assign the slots?

Typically, we build a tree rooted at the sink, and assign slots depending on whether we are targeting broadcast or converge-cast.

Nodes in two-hop neighbourhood cannot transmit at the same time and so cannot have the same slot.
In broadcast, nodes further away from the sink must be assigned larger slots.
In converge-cast, nodes closer to the sink need to be assigned larger slots.

The chance of node failures must also be considered.

Network Layer

The Network Layer is responsible for finding the path to be followed by data packets from source node to destination node.

There are a few key concepts:

Loops - Combination of incorrect routing tables or information which causes a data packet to be routed endlessly in a loop between nodes.
Time-to-Live (TTL) - A value in a data packet which tells relays how long the packet should remain in the network before it is discarded. Usually measured as number of hops.
Beacon - Control message broadcast across the network.

Flooding

Very basic protocol - nodes which receive a given packet for the first time broadcast it to all neighbours. It is not forwarded by its intended recipient.

Pros
- Very simple.
- Efficient when rate of transmission is low, and when transmitting small packets.
- High reliability of delivery - packets have the opportunity of taking multiple routes through the network.
Cons
- Potential for very high overhead.
- Waste of bandwidth and energy.
- Potentially lower reliability at the MAC level.

Gossip

Similar to flooding, but nodes only forward to a subset of their neighbours. It is more scalable compared to flooding.

Distance Vector (DV) Routing

Operates using distance vectors - each node knows the cost to reach each of its one-hop neighbours. The Bellman-Ford algorithm can then be used to calculate the distances to route to other nodes in the network. This algorithm is used as it only requires information received from a node’s one-hop neighbours.

The algorithm operates as follows:

Compute costs of all direct links to the one-hop neighbours of a node.
Create initial routing table.
Create/update distance vector (the distances to other nodes in the network) using the distances to adjacent nodes.
Periodically broadcast the distance vector to the node’s local neighbourhood. Typically this period ranges from seconds to minutes.
Nodes hearing the DV use it to update their own routing tables, and return to step 3.

Link Breakages

DV updates no longer arriving may indicate to a node that a link has gone down. In this case:

The node will update its DV setting the cost to the supposedly failed node to infinity, and broadcasts.
Adjacent nodes update routing tables with this new information.
This continues until a new path is distributed throughout the network.

This can result in loops occurring until nodes learn about failed links.

Destination Sequenced Distance Vector (DSDV)

Similar to Distance Vector routing.
Each node maintains a routing table.
Updates are exchanged between neighbours at regular intervals or when important changes in topology occur.
Updates contain destination, metric, and increasing even sequence numbers, originated by the destination node.

When a node hears new information or information with a better metric than a currently stored route, it updates its table and increments the sequence number to the next even number.

If a node detects a broken link, the sequence number is incremented to the next odd number and the information is broadcast. When an update to resolve the broken link is discovered, it will have the next even sequence number and so will overwrite the odd-numbered information.

Fluctuations

Variations in the time taken for information to reach nodes can lead to fluctuations and unnecessary route advertisements being broadcast for a period of time, before the routing tables settle.

Can be resolved by measuring and recording a settling time, and waiting for twice this length of time before broadcasting any updates which are received.

Optimised Link State Routing (OLSR)

Proactive method of routing based on link state routing.

Nodes periodically flood the status of their one-hop links.
- Re-broadcast link state information received from neighbours.
- Keep track of link state information received from other nodes.
- Use this information to determine next hop to a destination.
Overhead of flooding link state packets is reduced by requiring fewer nodes to forward the information to all nodes.
- A broadcast from a node $x$ is only forwarded by its multipoint relays.
- Very efficient in highly dense networks.

Selecting Multipoint Relays

Can use a heuristic approach to select each node’s multipoint relays. Need to consider nodes within a two-hop neighbourhood of node $x$.

Initialise $MPRSet(x) = \emptyset$.
While $\exists n \in N_2(x)$ not covered by $MPRSet(x)$…
1. Add to $MPRSet(x)$ the node in $N_1(x)$ which would cause the greatest number of uncovered nodes in $N_2(x)$ to become covered.

Each node also maintains a list of neighbours which have selected it as a multipoint relay. A node will forward packets received from nodes in this list. These lists are also exchanged periodically in Topology Control (TC) packets, which are then used to compute routing tables.

Dynamic Source Routing (DSR)

Protocol based on flooding to find new routes.

Route Request (RR) packets flooded by a source node.
- Basic information in RR packet is Source, Destination, and Route traversed so far.
Intended destination replies with a Route Reply (RRp) packet upon receiving the first RR packet.
RRp is sent on the reverse route to that which it received the packet, with this route information appended.

Improvements

Promiscuous mode and caching can be used to improve the performance of DSR.

Intermediate nodes in a path can learn about routes to other destinations and store this information in a cache.
Upon receiving a RR packet, nodes can send back a RRp with the full route stored in their cache.
Can also learn about broken links and remove broken paths from cache.
TTL can be gradually increased for RR packets if no route is found.
Cache can speed up route discovery and reduce the amount of propagation of RR packets.

Note that RR packets should be forwarded only if they are received on a link which is known to be bi-directional. This is because otherwise, the RRp packet could not be reversed back through the network. If IEEE 802.11 is used, then links have to be bi-directional anyway, though.

Link Breakages

A Route Error (RE) packet is sent back to a sender by nodes involved in a route if a link breakage is detected.

The source will then flood the network to discover a new route, and intermediate nodes will update their caches if in promiscuous mode.

Multiple Routes

A destination may reply with RRp packets for every RR it receives, and then a metric may be used by the source to select the best route to use. Intermediary nodes may also reply with RRp’s from their caches in addition to forwarding the RR.

Ad-Hoc On-Demand Distance Vector Routing (AODV)

DSR includes the full route in packet headers, which results in large headers with little packet data. AODV attempts to improve on DSR by maintaining information about a path at the nodes, so that data packets do not need to include full routes.

Route Request (RR) packets are flooded as in DSR.
- When a node forwards a RR packet, it creates a pointer to the previous node towards the source.
- When a RRp packet is forwarded, it creates a pointer to the next node towards the destination.
AODV also uses caches and assumes bi-directional links.

Cached routes in AODV are different from DSR:

In AODV, each node maintains one entry per destination (whereas in DSR nodes can maintain multiple routes for a single destination).
Each entry in the AODV cache contains:
- Destination ID,
- Next hop,
- Destination sequence number,
- Timeout,
- Metric.
Cached routes are periodically purged.
- A node maintaining a pointer in a reverse path purges the information after a timeout interval. The timeout should be long enough to allow RRp packets to get back to the source.
- A node maintaining a pointer in a forward path purges the information if not used for an active route timeout interval.

AODV uses the same concept of sequence numbers as in DSDV. The destination sequence number specifies how fresh a route to the destination must be before it can be accepted by the source.

Information about link breakages are propagated using unsolicited RR packets with updated destination sequence number and metric.

Unsolicited RR packets are sent to active nodes in the path.
Active nodes update their cache by removing the broken path.
When the source node receives the unsolicited RR packet, it may initiate a new route discover the destination with updated source and destination sequence numbers.
- This ensure then RR packet builds a new, viable route, and no intermediate nodes reply if they still regard the existing route as valid.

Locally, routes can be fixed with local query (LQ) packets with a short TTL, where if a LQ finds the destination then the destination replies with a RRq with the new found route.

Temporally Ordered Routing Algorithm (TORA)

Based on destination-oriented Directed Acyclic Graphs (DAGs). The algorithm can establish routes quickly with multiple routes to a destination, and only reacts when necessary, not to every topological change.

Network is modelled as a graph $G = (N, L)$.
- $N$ is a finite set of nodes.
- $L$ is the set of communication links.
Each node has a unique identifier (ID).
Communication links are bidirectional and categorised as either:
- Undirected.
- Directed from node $i$ to node $j$.
  - That is, $i$ is upstream of $j$ and $j$ is downstream of $i$.
- Directed from node $j$ to node $i$.
  - That is, $j$ is upstream of $i$ and $i$ is downstream of $j$.
Each node $i$ has a set of neighbours $N_i \subseteq N$ defined as nodes $j$ such that $(i, j) \in L.$

There are three basic functions in the TORA algorithm:

Creating Routes - Uses demand-driven query/reply. Establishes a sequence of directed links leading from the source to the destination.
- This is done by forming a destination oriented DAG - the destination node being the sink of the graph.
- A query packet (QRY) is flooded through the network with destination ID.
- An updated packet (UDP) propagates back if routes exist.
Maintaining Routes - Uses link reversal algorithm.
- Reacts to topological changes in the network.
Erasing Routes - Occurs when partitions are detected in the network.
- Removes invalid directed links and routes.
- Clear packet (CLR) is flooded through the network.

At any point in time, an order quintuple is associated with each node $n_i$, $H_i = (ti, oid, ri, di, i)$.

$ti$ - Time tag set to time of link failure.
$oid$ - Unique originator ID that generates new reference level.
$ri$ - Single bit used to divide each of unique reference levels into 2 unique sub-levels.
$di$ - Integer to order nodes with respect to common reference level.
$i$ - Node ID.

This represents the hight of a node $n_i$ with respect to:

A reference level represented by the first 3 values.
- A new reference level is defined every time a node has no downstream neighbour.
A $d$ value important for reference level propagation.
Each node $i$ has a height $h_i$ such that the nodes may be totally ordered.
- Height is usually the distance to the destination (number of hops).
Node $j$ is considered downstream of node $i$ if $h_i > h_j$. Link $(i, j)$ is directed from $i$ to $j$.
Node $i$ is a local minimum when it has no outgoing links.
The ordering of NULL heights (unassigned hight values) always results in a DAG.

Data Centric Routing

Hard to assign specific addresses to motes in very large WSNs - address-based approaches used in ad-hoc networks are not suitable in this case.
Data centric routing is based on attribute-based naming - querying a phenomenon rather than an individual node.

Directed Diffusion

Data-centric method - routing for named data.
All nodes in the network are application-aware.

There are four main stages to the operation of this routing method:

Interest propagation
- The data request is provided by interest messages initiated by the sink.
- Interest messages are exploratory in nature and serve to detect nodes with matching data for a particular task.
- Nodes store interest messages in their interest cache and broadcast them to their neighbours.
- Messages will include fields such as when the interest expires, the interval to send the data, and perhaps an area the sink is interested in.
Gradient setup
- Gradients indicate the nodes from which the interest was received.
- Local rules may be defined to set up gradients:
  - All nodes from which the interest message was received.
  - Node from which the interest message was first received.
  - Nodes with the highest remaining energy.
- Interest messages indicate the required data from the sensors - for a specific period of time and with a specific data rate.
- Nodes which match the interest become source nodes and send data along the interest’s gradient route.
Reinforcement
- Initial interest messages are exploratory messages - intended for path setup and repair.
- Once the sink starts receiving data, it can reinforce a specific route by resending the interest to a specific neighbour.
  - A reinforced route allows receiving ‘real data’ at a higher data rate.
  - For example, the sink might choose the neighbour from whom it first received the latest event matching the interest.
Data delivery
- Source will continue to reinforce routes to continue receiving data.
- Routes can be negatively reinforced by not reinforcing them - the gradients will eventually expire.
- Can also send interest with lower data rate for negative reinforcement.

Rumor Routing

Event-based - ideal for dense WSNs. Nodes create and spread agents.
- Agent - Data packet that randomly walks spreading information about an event.
- Agents propagate and install routing information to events in each node they visit.
- Agents are propagated usually using unicast transmission with a TTL.
The sink, when interested in an event, sends out a query on a random walk.
- High probability of finding a pre-established route to the event.
Based on the idea of intersecting lines in a bounded rectangular region.
- Probability of any two random lines intersecting is about 69%.
- With five lines, this rises to 95%.
- So highly likely routes for events can be combined.
Each node, for each event, stores the number of hops along it is in the route and the node for the next hop for that route.
Queries are targeted to an event.
- If a node has a route towards the target event, it forwards the query along the route. If not, it is forwarded to a random neighbour.
- High chance eventually the query will reach a node which has heard about the event.

Hierarchical Routing

Data-centric routing methods with flat topologies suffer from data overload close to the sink as the node density increases - unequal use of energy.

Hierarchical routing assumes a hierarchical topology.
- Nodes grouped into clusters with cluster heads.
- Cluster heads aggregate data.

Low Energy Adaptive Clustering Hierarchy (LEACH)

Dynamically selects nodes as cluster heads to form clusters.
Communications inside a cluster are directed to the cluster head - data aggregation.
Cluster heads directly communicate with the sink (or other cluster heads).
The cluster head roles are dynamically changed so high energy consumption is spread across all nodes.
Operation of LEACH is controlled through rounds.
- Set-up Phase
- Steady State Phase

Set-up Phase

Each node decides whether or not to become a cluster head for the current round.
- Based on the desired percentage of cluster heads and the number of times the node has been a cluster-head so far.
- Each node chooses a random number in the range $[0, 1)$. If the number is less than a threshold, the node becomes a cluster-head for the current round.
After selection, cluster heads advertise their decision to their neighbours - CSMA based random access.
- Nodes become cluster members of the first cluster head from which they receive an advertisement.
- Nodes advertise information about their membership to the cluster-head.
Communication schedule is created, based on TDMA/CDMA.

Steady State Phase

Each cluster communicates using CDMA.
When a node decides to become a cluster-head, it chooses randomly from a list of spreading codes.
Cluster-heads inform all the nodes in the cluster to transmit using the chosen spreading code.

Collection Tree Protocol (CTP)

CTP is intended to be robust to stale route information and agile to link dynamics. Uses data-path validation - using data packets to dynamically probe and validate the consistency of its routing topology.

It also has a low overhead when the topology is stable.

Use of adaptive beaconing - extends the Trickle algorithm so it can be applied to routing control traffic.
As with Trickle, the exponential timer allows nodes to send very few control beacons when the topology is consistent, yet quickly adapt when the data-path discovers a possible problem.

Data-path Validation

Every node maintains an estimate of the cost of its route to a collection point.
Metric used is Expected Transmissions (ETX). Though, any similar gradient metric can be used.
A node’s cost is the cost of its next hop + the cost of the link to its next hop.
- Collection points advertise a cost of 0.
Each data packet contains the transmitter’s local cost estimate.
When a node receives a packet, it compares the transmitter’s cost to its own.
If a transmitter’s advertised cost is not greater than the receiver’s, then the sender’s topology information is stale and there may be a routing loop.
- Detection can occur very quickly - from the very first packet after loops occur.

Adaptive Beaconing

Stale routing information can be updated by sending control beacons.
Beacons contain the transmitter’s local cost estimate too.
Instead of using a global sequence number as in trickle, CTP using the routing cost gradient to control when to reset the timer interval.
- The routing layer resets the interval to $\tau_1$ when it detects stale information or loops.
- A node is asked to forward a data packet from a node whose ETX is not higher than its own.
- Its routing cost decreases significantly. This may mean a lower cost route may exist.
- Routing does not necessarily suppress message transmissions as in trickle (i.e. $k = \infty$ if we consider trickle).

Data Transmission

Requires…

Per-client queueing.
Hybrid send queue.
Transmit timer.
Packet summary cache.

CTP uses a very aggressive retransmission policy. By default, it will retransmit a packet up to 32 times. Instead of dropping the packet, CTP combines a retransmission delay with proactive topology repair to increase the chances of delivering the current packet.

Reprogramming Techniques for WSNs

Requirements of a WSN may change - e.g. the understanding of an environmental phenomenon may change with time, requiring changes in the WSNs.

Traditionally, reprogramming is done manually, usually node by node by sending the code from a PC directly to the memory of a sensor node - time consuming, sometimes not feasible. The solution is reprogramming with no physical contact with the sensor nodes, by disseminating the code wirelessly.

We desire approaches which are energy efficient, scalable, and topology independent.

Code delivery must be close to 100% reliable and reach all intended nodes.
Code transmissions are large compared to regular data.
Many senders may transmit at the same time resulting in possible collisions.

Therefore, we need strategies for accessing the medium, disseminating code, and making sure the code is disseminated correctly, all while saving energy.

Multihop Network Reprogramming Service

An advertisement message (ADV) is sent at random intervals by the node who receives a new code.
Only the nodes needing the code reply, with a download request (DLR).
If multiple nodes advertise the same code, a source node overhearing another ADV can sleep if the request counter value is greater at the other source node.
- This helps reduce the number of transmissions which are required.
Nodes needing the code reply to all ADVs received. A source node overhearing a DLR for another source can go to sleep if the request counter value is greater at the other source.
Codes may be subdivided into smaller segments if large.
- Nodes can then receive/transmit the segments sequentially.
- A source goes to sleep if it hears an advertisement for a lower segment ID than it is currently advertising.
Sources with a higher number of potential receivers are assigned higher priority, which avoids transmitting the same code by too many nodes.
- Sleeping is used to reduce energy consumption.
- In a neighbourhood, there is at most one sender transmitting the code at any given time.

Sending Packets with Pipelining

When a node becomes a sender, it broadcasts a start download message to announce this fact. Then, it starts sending the segment packet by packet.
Individual packets for the same segment may be received from different sources.
The download process ends when the receiver receives an end download message from the sender.
A Missing Vector at receivers keeps track of the packets received for a particular segment.
Similarly, Forward Vectors at sources keep track of a union of all the missing vectors received, and uses this to know what needs transmitting.

Backbone Approach

Select some nodes from the WSN to form a backbone, which is a connected, minimal, dominating set.
Dominating nodes control their neighbours.
Dissemination process now deals with a simple topology - from a node, route to the backbone, and routing in the backbone is simple since there are fewer nodes.
Want to minimise the number of sources and so ideally need to find the minimum connected dominating set (MCDS) to disseminate the data.

A dominating set $DS$ of a graph $G = (V, E)$ is a subset $V’$ of $V$ such that every vertex $v \in V$ is either in $V’$ or adjacent to some member of $V’$. A dominating set is connected if the sub-graph induced by it is connected. A minimum connected dominating set is a connected dominating set of minimum cardinality.

Solving MCDS is NP-hard, and is even hard to approximate.
Simple solution is the tree growing approach.
- Dominating set is constructed as a spanning tree.
- Nodes and edges are iteratively added until all nodes are covered.
- At each step, pick the vertex which will dominate the most nodes.
- This can result in worst-case performance for some graphs, however.

Multipoint Relays (MPR)

Multipoint relays provide a more resourceful alternative to traditional flooding.
MPRs of a node are indeed a dominating set, and the full set of MPRs in a network may form a connected dominating set.
- MPRs are computed using a source node as a reference which is why the set obtained might not actually be minimum.

Failure Detection

In a Synchronous system we assume the following conditions:

Processing - The time it takes for a process to execute a step is bounded and known.
- When a process takes a step, another process may take at most a bounded number of steps.
Delays - There is a known upper bound limit on the time it takes for a message to be received.
Clocks - The drift between a local clock and global real time clock is known and bounded.

In an asynchronous system, none of these assumptions are known to hold. When a process takes a step, another process may take at most an unbounded (but finite) number of steps.

In a partially synchronous system, the timing assumptions may be eventual.

Properties of Failure Detectors

Ability to detect all failures.
Strong Completeness - Eventually, every crashed process is permanently detected by every correct process.
Strong Accuracy - No process is suspected by any process before it has crashed.
Perfect Failure Detectors are both strongly complete and strongly accurate.

Implementation of Failure Detectors

Process periodically exchange heartbeat messages.
A process sets a timeout based on worst case round trip of a message exchange.
A process $i$ suspects a process $j$ (of a crash failure) if $j$ times out at $i$.
A process $i$ that delivers a message from a suspected process $j$ revises its suspicions and increases its timeout for $j$.

Algorithm Outline

Initialise timeouts for all processes $j$ in the system.
Periodically send ‘$i$ is alive’ to all other processes.
When a timer expires for process $j$, add $j$ to suspected list and reinitialise its timeout.
If heartbeat is received from a suspected process $j$, remove $j$ from suspected and increase its timeout and reinitialise with new timeout.
- If $j$ was not in suspected, just reset its timeout.

Neighbourhood View Consistency

The Neighbourhood View Consistency problem describes the ability for nodes in a given neighbourhood to agree on what nodes are alive and what nodes are unreachable.

The formal specification consists of two parts: safety and liveness.

Safety - Nothing bad happens (e.g. mutual exclusion).
- A working node is never removed from a neighbourhood.
Liveness - Something good eventually happens (e.g. termination).
- Every time a node fails, then eventually all other nodes in the neighbourhood will remove the failed node.

In the fault model assumed, we can have:

Node crashes.
Link (omission) failures.
Transient failures (those only lasting a short amount of time).

Solving NVC with a Perfect Failure Detector (PFD) is impossible.

A device capable enough to solve strong NVC should be able to:

Return the state of a node,
Return the neighbourhood of a node.

So as opposed to returning a list of suspects, it also returns the neighbourhood of each suspect.

It is impossible to implement a perfect PseudoCrash detector during memory failures either - only eventual strong accuracy and eventual strong completeness are possible.

A possible implementation is in the slides.

Weak NVC

The Weak NVC problem is defined as:

Given a network $G = (V, A)$, and two nodes $n, m \in V$, a program provides weak NVC for $G$ if every computation of the program satisfies:

Eventual Safety - There exists a time after which no working node is removed.
Weak Liveness - Every time a node $m$ pseudo-crashes, then eventually $\forall n \in N$, $n$ removes $m$ or a fault is raised.
Validity - A fault flag is raised only if there is a fault in the network.

Possible implementation is in the slides.