Universitá degli studi di Trento
Trento, Italy
Secure Data Aggregation
A report presented in partial fulfillment of the requirement for the course
Network Security
September 2005
Christian Marzadro
Mirko Muro
Table of Contents
1 Introduction 1
2 Attack Model 1
3 Data Aggregation 2
3.1 Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.2 Hierarchical . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4 Key Management and Authentication 5
4.1 Key Symmetric vs Asymmetric . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.2 In Cluster Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.3 In Hierarchical Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
5 The Mean 8
6 The Sum 14
7 The Minimum 15
8 The Median 17
9 Conclusion 21
Bibliography 29
ii
1 Introduction
Sensor networks promise viable solutions to many monitoring problems. Real time traf-
fic monitoring, wildfire tracking, wildlife monitoring, or building safety monitoring are the
field where found the sensor networks. However, the practical deployment of sensor net-
works run into many challenges imposed by real-world demands. Sensor networks deployed
for mission-critical applications are potential targets for malicious attacks. In fact, in many
applications sensors are deployed in open environments, and hence are vulnerable to phys-
ical attacks, potentially compromising the sensor’s cryptographic keys. These sensors are
not equipped with tamper-resistant hardware due to cost constraints. When the sensors are
deployed for battlefield surveillance and forest fire monitoring, we would that the sensor net-
work must not only report each relevant event promptly, but also reject false reports injected
by attackers. The primary security challenges for wireless sensor networks lie in addressing
the conflict between limited resources (energy, computational overhead, memory, etc.) and
security requirements. The sensors are usually simples, low powered devices that can trans-
mit only within small range (cluster) of their location, and so (usually) they can not report
the measurements directly to the distant home server. Thus, a sensor-enhanced is often used
as an intermediary between the base station and the sensor nodes. Particularly challenging
for information aggregation in large sensor networks is the relationship between the resource
constraints and security issues. Message aggregation can reduce communication overhead
significantly, but message aggregation makes security more difficult. Aggregation collects
results from several sensors and calculates a smaller message that summarizes the important
information from a group of sensors. For example, suppose the operator is interested in the
average sensor reading for some value in the network. Given a query, it may be unnecessary
and inefficient to return all raw data collected from each sensor, due to the communication
constrains; instead, information should be processed and aggregated within the network and
only processed and aggregated information is returned. In the case where the sensor network
is too large, it possible subdivide it in clusters and build a hierarchical aggregator to enable
the aggregation, due to the capable to handle the whole network by a single aggregator.
In the Section 2 we present a threat models (like Stealthy Attack). In Section 3 and Section
4 we deal on respectively the data aggregation scheme and their key distribution. Whereupon,
in the Section 5, Section 6, Section 7 and Section 8 are listed a series aggregation function,
respectively Mean, Sum, Minimum and Median function. Finally, Section 9 compares the
papers and tries to explain the advantages and disadvantages for the all methods analysed.
2 Attack Model
In sensor networks an attacker can perform a wide variety of attacks. We focus on stealthy
attacks, where the attacker’s goal is to make the home server accept false aggregation results,
which are significantly different from the true results determined by the measured values,
while not being detected by the home server.
We consider a setting with a polynomially bounded attacker, which can corrupt some of
the sensors as well as the aggregator. Actions of a corrupted device are totally determined by
the adversary. Some sensors may be compromised and report wrong values that will affect
2
the aggregation result. If a corrupted sensor simply reports a wrong value, it may be difficult
to detect the misbehavior since such a detection may require application/semantics specific
knowledge.
An attacker can manipulate, without limit, the result of the aggregation operation, gaining
complete control over the computed aggregate. For instance, any protocol that computes
the average, sum, minimum, or maximum function is insecure against malicious data, no
matter how these functions are computed. In a sensors network formed by a lot of nodes, the
same stimulus being detected by multiple sensors is a necessary condition that enables cross-
verification of reported events in the presence of compromised nodes. If there is a single node
that is monitoring an area, it would be difficult verify whether an event reported by this single
node is real or forged. Only with multiple sensing nodes it’s possible cross-check the ground
truth. Under the assumption that the data from the sensors can be trusted, it’s assumed that
t
within a cluster upto nodes could be compromised.
All articles assume that base stations remain trustworthy and unassailable. This assump-
tion seems plausible: base stations are rarer and hence we can spend more on them, so it may
be feasible to enclose them in high-quality tamper resistant enclosures or to place them in
physically secure, access-controlled locations.
Another configuration is when we have all sensors that can reach the base station without
use the sink model, and make jump sensor by sensor to forward the information to the base
station. This is a good way to avoid the aggregation problem since the aggregation function
is made by the powerful base station. But this scheme is not applicable for the all wireless
sensor networks, because the sensors deployed are usually cheap with low performance and
sparse also far from the base station. There are scheme that can compute the aggregation for
these cases. But if they are bypass by an attacker, it could make accept false result to the base
station. Thus, it’s necessary to build a system that can allow the base station to verify the
correctness of the data coming from the sensors. In this report we don’t consider the routing
of information inside the network and how it can be implemented. The hypothesis is that the
system is able to forward the information from the sensors to the base station, while there are
a limited percentage of compromised nodes.
3 Data Aggregation
Collects results from several sensors and calculates a smaller message that summarizes the
important information from a group of sensors it’s called aggregation. For example, suppose
the operator is interested in the average sensor reading for some value in the network.
There are two type of aggregation:
• Out-Network Aggregation
• In-Network Aggregation
In the first case the base station performs an aggregation computation to obtain the ag-
1 ). Thus, sensor nodes act as data sources, and the base station acts as a
gregate (like in RAS
1 “Resilient Aggregation in Sensor Networks” [8] 3
n
sink. It’s considered only a single base station and associated sensor nodes. It’s ignored the
structure of the multi-hop network, assuming only that each sensor node has a separate link
to the base station. This abstracted architecture is depict in Fig.1. The main problem of this
Fig. 1: An abstract sensor network architecture, with inessential underlying physical struc-
tures abstracted away. We have n sensor nodes (the small circles), each with a separate secure
ith
channel to a single trusted base station (the large solid square). The sensor sends measure-
f
x to the base station, and the base station uses the function to compute the aggregate
ment i = 9.
y. n
In this picture, [8]
configuration is that the aggregator forwards to the home server all data and authentication
information from each sensor. Given all the data, the home server can verify authenticity of
each data item, and answer all the statistical queries locally. However, the transmission is
increased and consequentially the power consumption of the whole sensors network.
In the second case, In-Network Aggregation reduces the communication overhead, only
processed and aggregated information is returned to the base station. It’s possible subdivide
the problem in two:
• Cluster
• Hierarchical
3.1 Cluster
The sensor networks allows each of the nodes to organize themselves into clusters once de-
ployed. There are some nodes called aggregators or cluster-head that help aggregating in-
formation requested by a query. The aggregator collects the data from sensors and locally
computes the aggregation result. [3] propose to send at the base station also the prove that
2 where a data fusion
the sensor work correctly. Similar configuration are present in WBA
node receives data from a number of sensors, conducts data fusion, and then sends the result
(decision) to the base station with the help of two witness.
If the Query is local, i.e. the detection of an event, we can consider the cluster like
3 paper, is a normal
the number of sensor that reveal it. The Cluster-head described by SEF
2 “A Witness-Based Approach For Data Fusion Assurance In Wireless Sensor Networks” [9]
3 “Statistical En-route Filtering of injected False Data in Sensor Network, 2004” [4]
4
sensor elected from the other sensors like it. The number of the sensors that take part is low
and so the number of data to compute, so the Cluster-Head doesn’t require high computation
capacity.
Fig. 2: An example sensor network. Suppose we want to monitor three areas of interest, the
road, the river, and the munition plant, by deploying a cluster of sensor nodes (filled circles)
in each area. The base station sends commands or queries to the sensor nodes, and receives
reports from them. All the communications are relayed by some forwarding nodes (blank
circles). [10]
3.2 Hierarchical 4
The hierarchic scheme is only presented by SAW where the sensors network is presented
like a tree diagram. The leaves send the measure to their parent that compute the aggregation
and send the result to its parent. SAW says:
“Keep in mind, though, that the benefits of our protocol come when there are
a large number of nodes arranged in a deep tree, so that many readings can be
aggregated in a single message.”
In this paper, for simplicity, they assume only leaf nodes are collecting sensor readings,
and intermediate nodes are just aggregating and forwarding data.
4 “Secure Aggregation for Wireless Sensor Networks” [5] 5
4 Key Management and Authentication
The configuration used by all protocols is that each node is initialized, before deployment,
with a symmetric secret key. The base station establishes secrets with the ad hoc wireless
nodes before deployment.
4.1 Key Symmetric vs Asymmetric
Security issues in mobile ad hoc networks are similar to those in sensor networks, but the
defense mechanisms developed for ad hoc networks are not directly applicable to sensor
networks. Ad hoc network security mechanisms are based on public key cryptography. Con-
ventional public-key algorithms can help set up and manage keys in a network, but these
algorithms are not feasible in a sensor network because of their communication and compu-
tation complexity.
Almost all protocol presented say that is a risk to store the same key on every device to
enable encryption or authentication, since an adversary who recovers the key from a single
device will be able to control the entire network. Thus, is necessary to use different keys for
each sensor.
4.2 In Cluster Scheme
5
In SIA , the authors assume that each sensor has a unique identifier and shares a separate
secret cryptographic key with the base station and with the aggregator.
O(n)
The base station and the aggregator doesn’t need to store keys. In fact, they store
K
K and , and each sensor node stores the shared keys MAC (node
simply a master key B A K B
(node ID), where MAC is a secure message authentication code that is used
ID) and MAC
K A
here as a pseudo-random function. To verify the authenticity of each sensor reading, each
sensor shares a key with the aggregator. With the distribution of the keys it’s preventing only
sensor impersonation, and not data come from a damage or currupt sensor.
A new approach is that the authors of SIA call aggregate-commit-prove:
“aggregators help computing aggregation of sensor nodes’ raw data and reply to
the home server with the aggregation result together with a commitment to the
collection of data. The commitment to the input data ensures that the aggregator
uses the data provided by the sensors, and that the statement to be verified by the
home server about the correctness of computed results is meaningful.”
One efficient way of committing to the data is a Merkle hash-tree construction [6], [7].
6 algorithm. This is a efficient way to reduce the com-
This method is used also in the DAV
munication overhead between the network and the base station.
All the collected data are like the leaf nodes of the tree, and the aggregator then computes
a binary hash tree starting from them. In other words, each internal node in the hash tree is
computed as the hash value of the concatenation of the two child nodes. The root of the tree is
5 “SIA: Secure Information Aggregation and verification protocol in Sensors Network” [3]
6 “A Secure Data aggregation and Verification Protocol for Sensor Networks” [2]
6
called the ‘commitment’ [3] of the collected data. Since the hash function in use is collision
resistant, once the aggregator commits to the collected values, he cannot change any of the
collected values. Fig.3 pag.6 gives an example of a Merkle hash tree.
Fig. 3: Merkle hash tree used to commit to a set of values. The aggregator constructs the
, ..., m
m . To lower the size of verification
Merkle hash tree over the sensor measurements 0 7
information, the aggregator first hashes the measurements with a cryptographic hash function,
= ),
H(m
v assuming that the size of the hash is smaller than the size of the data. To
e.g., 3,0 0
construct the Merkle hash tree, each internal value of the Merkle hash tree is derived from its
= || ).
H(v v
v The Merkle hash tree is a commitment to all
two child nodes: i,j i+1,2j i+1,2j+1 v , a verifier can authenticate any leaf value
the leaf nodes, and given the authentic root node 0,0
by verifying that the leaf value is used to derive the root node. For example, to authenticate
m v , v , v m
m , the aggregator sends along with , and is authentic
the measurement 5 5 3,4 2,3 1,0 5
= || || )) || )).
H(v H(H(v H(m v
v [3]
if the following equality holds: 0,0 1,0 3,4 5 2,3
A different point of view is to use an asymmetric system with the private key subdivided
and another key for the sensor. In DAV [2] article they propose a protocol named Cluster
Key Establishment that generates a secret cluster key for each cluster. Each sensor node only
has a part of the secret cluster key and the cluster key is hidden from each node. The public
key for the secret cluster key is known to all nodes within the cluster as well as the base
station. Since the secret cluster key is hidden from all nodes, attacks on the cluster key are
not possible.
In WBA, in order to prove the validity of the fusion result, the fusion node has to provide
proofs from several witnesses. A witness is one who also conducts data fusion like a data
fusion node.
“They assume that the data fusion node and witness nodes share a secret key with
F
the base station. Let denote the data fusion node. Assume that we have chosen 7
, ..., w k , ..., k
m w , and represent the MAC keys they share with
witnesses, 1 1
m m
the base station. To reduce energy consumption in this scheme, they analyzed
and computed the minimum length needed for the Message Authentication Code
(MAC) to achieve a predefined level of security.”
This configuration type it’s good because the number of bits used for MACs does not
increase linear
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.