Designing heterogeneous multiprocessor embedded systems
following structure v[i] op w[i] or q op w[i] and the total
The goal of these metrics is to identify the structural properties of number of lines, where v and w are vectors, and op is an
a functionality focusing on the analysis of the control flow operator different from =.
Definition: Weak Harvard Degree (WHD
) is the ratio between the number of
For each method m, WHD
Definition: Control Flow Complexity (CFC
For each method m, CFC source lines that contain, outside a loop, expressions such as v[i]
is defined as the ratio between the
m op w[i] or q op w[i] and the total number of lines, where v and
number of source lines that contains loop or branch statement w are vectors, and op is an operator different from =.
and the total number of lines.
The value of such a metric is increased by variations in the GPP oriented metrics
execution flow due to decision points (i.e. loops and branches), The goal is to identify functionalities that significantly rely on
therefore a linear sequence of instructions has zero control flow operations that involve conditional dependent control flows,
complexity. complex data structures and complex I/O management.
Definition: Loop Ratio (LR )
Definition: Conditional Ratio (CR
For each method m, LR is defined as the ratio between the The Conditional Ratio of a method m is CR=CFC–LR where
number of source lines that contain loop statements and the total CFCm is the Control Flow Complexity and LRm is the Loop
number of lines. Ratio.
Such a metric allows discriminating between computational and Definition: I/O Ratio (IORm)
control oriented functionalities. Moreover, high LR values
m is the ratio between the
For each instance of method, IOR m
indicate the possibility of exploiting a spatially limited number of source lines that contain I/O operations (e.g. read,
computational unit by means of a compact implementation and a write, etc.) and the total number of lines.
strong component reuse. )
Definition: Structure Ratio (STR
DSP oriented metrics For each method m, the Structure Ratio is the ratio between the
The goal is to identify functionalities suitable to be executed by a number of structures declared and the total number of
DSP by considering those issues that exploit the most relevant declarations.
architectural features of such executor class: Circular Buffering, ASIC-like oriented metrics
MAC operations, and Super Harvard architecture. The goal is to identify regular functionalities that significantly
For the circular buffering, the goal is to identify subsets of the rely on operations that involve bit manipulation. Therefore, in
specification that access a linear data structure (one-dimensional addition to some of the previously defined concepts (i.e. LR, and
array, row or column of bi-dimensional array). The use of a DR for the type bit) the following metric is defined.
circular buffer is identified, more or less explicitly, by portions of m
code that try to shift an array of one or more positions. Definition: Bit Manipulation Rate (BMR )
is the ratio between the number of
For each method m, BMR
Definition: Strong Circularity Degree (SCD m
m source lines that contain bit manipulation operations (e.g. and,
is the ratio between the number of
For each method m, SCD
m or, xor, etc) and the total number of lines.
source lines that contain expressions of the form v[i]=v[i K] The information gathered by means of the metrics previously
and the total number of lines, where v is a vector (or a defined is organized in a global metric that allows a
row/column of a matrix), and K is a constant value. straightforward characterization of a functionality with respect to
Definition: Weak Circularity Degree (WCD
m each possible executor. Such a global metric, called affinity is
is the ratio between the number of
For each method m, WCD
m operatively defined in the following.
source lines that contain expressions of the form v[K]=f(v[i]) or The affinity
q=f(v[i]) and the total number of lines, where v is a vector (or a The affinity of a functionality can be expressed by a
row/column of a matrix), K is a constant value, and f(v[i]) is a normalization function applied to a linear combination of the
generic expression that involves v[i]. metrics, with weights that depend on the considered executor
For the MACs, the goal is to identify subsets of the specification class. Intuitively, the affinity towards a GPP executor depends
that express a particular mix of operations (i.e. a sum and a primarily on: the I/O Ratio, the Conditional Ratio, the Structure
multiplication) that a DSP can perform concurrently. Ratio, and the number of declared variables of GPP compatible
Definition: Strong MAC Degree (SMD
m type. The affinity towards a DSP executor primarily depends on:
For each method m SMD is the ratio between the number of
m the degrees of circularity, Harvard, and MAC, the Loop Ratio,
source lines inside a loop that contain expressions of the form and the number of declared variables of DSP compatible built-in
=s +s s and the total number of lines.
s 1 1 y type. The affinity towards an ASIC-like executor depends on: the
Definition: Weak MAC Degree (WMD ) Loop Ratio, the Bit Manipulation Ratio, and the number of
is the ratio between the number of
For each method m WMD variables of bit type. Therefore, it is possible to evaluate the
source lines that contain, outside a loop, expressions of the form affinity for each method m as follows:
s =s +s s and the total number of lines.
1 1 x y
For the concurrent memory access, the goal is to identify subsets A f W C
of the specification able to exploit concurrent memory accesses to
instructions and data, as provided by the Super Harvard where:
A A A A
GPP DSP HW
m m m
Definition: Strong Harvard Degree (SHD m
For each method m, SHD is the ratio between the number of
source lines that contain, inside a loop, expressions with the
indicates in general (the three average values in
while the A
m m m
C SCD WCD SHD WHD SMD WMD IOR CR LR BMR DR DR DR STR
m m m m m m m m m m m
m bit z z Table 1 are nearly the same) those procedures that exploit some
z char , string z int | real features associated with the ASIC executor class.
0 0 0 0 0 0 1 1 0 0 0 1 1 1
Table 1. Affinity average values
W 1 1 1 1 1 1 0 0 1 0 0 0 1 0
0 0 0 0 0 0 0 0 1 1 1 0 0 0 Average Average Average
DSP aplications) DSP applications)
(on the whole suite) (only the (all but the
The weights of the matrix W are set to 1 when the associated 0,46 0,38 0,49
GPP 0,25 0,57 0,10
metric is meaningful for a given executor class, 0 otherwise. In DSP 0,27 0,28 0,27
this way, the affinity represents the sum of all the contributions
determined by each relevant metric. Since such a sum could be 4.1. Metrics and Partitioning
greater than one, a function should be applied to obtain values in To give the flavor on how the methodology can actually work on
the [0, 1] interval allowing a direct comparison between affinity real designs and to show its effectiveness, an example of co-
values related to different executors. analysis and system-level partitioning is here reported.
The adopted normalization function is the arctangent one because The considered application consists of 52 methods and its
it is limited to the interval [-/2, when x varies from - to Procedure Interaction Graph is represented in Figure 2. The
so. So, to normalize the affinity in the interval [0, 1] it should be target architecture is composed of an unconstrained number of
scaled of a factor. Moreover, to take into account that a value GPP, DSP and FPGA. Starting from an annotated VCG (with
of 1 for a single relevant metric means a strong matching between affinity, load and communication cost data), the partitioning tool
the functionality and the executor a proper coefficient is builds its procedure-level internal model. The next action of the
multiplied to the x in order to obtain an affinity equal to 0.9 in tool has been the cost function minimization based on a genetic
correspondence with x=1. Finally, to better discriminate between algorithms strategy.
low and high affinity values, a quadratic form is introduced,
leading to the following normalization function:
f x P1 P6 P9 P12
T P2 P3 P7 P8 P10 P11 P13
The function f(x), when applied to , provides affinity
values that are directly comparable and therefore it can be used to P18
P4 P5 P16 P17
P28 P22 P31 P34 P37
select the best executors class for each functionality. P47
P29 P30 P23 P24 P32 P33
4. METHODOLOGY VALIDATION P43 P48 P49
In order to support the presented co-analysis methodology, and to P25
validate the methodology itself, a tool has been developed and P52
P41 P50 P51
P42 P44 P45
integrated in the tool suite supporting the design flow of Figure 1. Figure 2. Procedure Interaction Graph
Due the wide diffusion of C language (especially in the DSP
field), a meaningful validation has been setup based on a C test Table 2 shows the affinity values of each procedure. The load
suite. A tool has been developed and integrated with a C/C++ depends on the imposed timing constraints and the relative
code analyzer (GENOA, ). The tool computes the affinity physical costs are 1 for a GPP, 1.2 for a DSP and 2 for a FPGA.
values for each system functionality that are then provided to the The goal of this validation is to check the behavior of the
system design exploration tools. partitioning tool for different timing constraints and different cost
The adopted benchmark suite is composed of 311 procedures; function weights in order to highlight, in particular, the role of the
each one of them representing a specific functionality. A subset of affinity values. Different timing constraints have been imposed on
these procedures (i.e. 100) has been selected from applications the execution time of the whole application in the following way:
oriented to digital signal processing and, therefore, they represent (evaluated by simulation for a single GPP
with respect to a T
a valid sample of the main functionalities involved in these system) in different experiments the constraints have been 90%
applications (e.g. Fast Fourier Transform, filtering, convolutions, , and 50% T . The latter constraint aims at forcing the
etc.). The other procedures are representative of a general set that partitioning tool to exploit the concurrency by increasing the
contains functionalities related to the field of coding, string number of executors.
manipulation, common operations (e.g. sorting) and parts of The weight of the affinity is variable and it assumes different
videogames. During the validation process, the values of the values during several experiments in order to enforce at each step
metrics previously defined have been collected, and the affinity weights of the affinity index in the cost function. For each value
value of each functionality has been evaluated in the normalized of the affinity weight, Table 3 and Table 4 report the iteration
form. (each iteration works with a finer-granularity) that has found the
Interesting considerations can be made by analyzing the averages minimum value for the considered cost function and the related
of the affinity values on the whole test suite, for the DSP timing simulation result.
for the DSP
applications, and for the others (see Table 1). A . Such a
Table 3 shows the results for the constraint 90% T
applications is fairly larger than the other affinity values and the choice enforces the presence of an architecture with more than
A values evaluated for the other application cases. It is worth one executor in order to reduce the execution time and, in fact, the
DSP has the largest average of the whole set,
noting that A timing constraint is always largely met. With lower affinity
revealing the general purpose nature of the related executors class, weights (e.g. 0 and 2), the partitioning does not consider DSPs
+1 anno fa
This paper considers the problem of designing heterogeneous multiprocessor embedded systems. The focus is on a step of the design flow: the definition of innovative metrics for the analysis of the system specification to statically identify the most suitable processing elements class for each system functionality. Experimental results are also included, to show the applicability and effectiveness of the proposed methodology.
I contenuti di questa pagina costituiscono rielaborazioni personali del Publisher Atreyu di informazioni apprese con la frequenza delle lezioni di Sistemi embedded e studio autonomo di eventuali libri di riferimento in preparazione dell'esame finale o della tesi. Non devono intendersi come materiale ufficiale dell'università L'Aquila - Univaq o del prof Pomante Luigi.
Acquista con carta o conto PayPal
Scarica il file tutte le volte che vuoi
Paga con un conto PayPal per usufruire della garanzia Soddisfatto o rimborsato