# Designing heterogeneous multiprocessor embedded systems

Anteprima

### ESTRATTO DOCUMENTO

following structure v[i] op w[i] or q op w[i] and the total

Structural metrics

The goal of these metrics is to identify the structural properties of number of lines, where v and w are vectors, and op is an

a functionality focusing on the analysis of the control flow operator different from =.

complexity. )

Definition: Weak Harvard Degree (WHD

m

) is the ratio between the number of

For each method m, WHD

Definition: Control Flow Complexity (CFC

m m

For each method m, CFC source lines that contain, outside a loop, expressions such as v[i]

is defined as the ratio between the

m op w[i] or q op w[i] and the total number of lines, where v and

number of source lines that contains loop or branch statement w are vectors, and op is an operator different from =.

and the total number of lines.

The value of such a metric is increased by variations in the GPP oriented metrics

execution flow due to decision points (i.e. loops and branches), The goal is to identify functionalities that significantly rely on

therefore a linear sequence of instructions has zero control flow operations that involve conditional dependent control flows,

complexity. complex data structures and complex I/O management.

)

Definition: Loop Ratio (LR )

Definition: Conditional Ratio (CR

m m

For each method m, LR is defined as the ratio between the The Conditional Ratio of a method m is CR=CFC–LR where

m

number of source lines that contain loop statements and the total CFCm is the Control Flow Complexity and LRm is the Loop

number of lines. Ratio.

Such a metric allows discriminating between computational and Definition: I/O Ratio (IORm)

control oriented functionalities. Moreover, high LR values

m is the ratio between the

For each instance of method, IOR m

indicate the possibility of exploiting a spatially limited number of source lines that contain I/O operations (e.g. read,

computational unit by means of a compact implementation and a write, etc.) and the total number of lines.

strong component reuse. )

Definition: Structure Ratio (STR

m

DSP oriented metrics For each method m, the Structure Ratio is the ratio between the

The goal is to identify functionalities suitable to be executed by a number of structures declared and the total number of

DSP by considering those issues that exploit the most relevant declarations.

architectural features of such executor class: Circular Buffering, ASIC-like oriented metrics

MAC operations, and Super Harvard architecture. The goal is to identify regular functionalities that significantly

For the circular buffering, the goal is to identify subsets of the rely on operations that involve bit manipulation. Therefore, in

specification that access a linear data structure (one-dimensional addition to some of the previously defined concepts (i.e. LR, and

array, row or column of bi-dimensional array). The use of a DR for the type bit) the following metric is defined.

circular buffer is identified, more or less explicitly, by portions of m

code that try to shift an array of one or more positions. Definition: Bit Manipulation Rate (BMR )

m

is the ratio between the number of

For each method m, BMR

)

Definition: Strong Circularity Degree (SCD m

m source lines that contain bit manipulation operations (e.g. and,

is the ratio between the number of

For each method m, SCD

m or, xor, etc) and the total number of lines.

source lines that contain expressions of the form v[i]=v[i K] The information gathered by means of the metrics previously

and the total number of lines, where v is a vector (or a defined is organized in a global metric that allows a

row/column of a matrix), and K is a constant value. straightforward characterization of a functionality with respect to

)

Definition: Weak Circularity Degree (WCD

m each possible executor. Such a global metric, called affinity is

is the ratio between the number of

For each method m, WCD

m operatively defined in the following.

source lines that contain expressions of the form v[K]=f(v[i]) or The affinity

q=f(v[i]) and the total number of lines, where v is a vector (or a The affinity of a functionality can be expressed by a

row/column of a matrix), K is a constant value, and f(v[i]) is a normalization function applied to a linear combination of the

generic expression that involves v[i]. metrics, with weights that depend on the considered executor

For the MACs, the goal is to identify subsets of the specification class. Intuitively, the affinity towards a GPP executor depends

that express a particular mix of operations (i.e. a sum and a primarily on: the I/O Ratio, the Conditional Ratio, the Structure

multiplication) that a DSP can perform concurrently. Ratio, and the number of declared variables of GPP compatible

)

Definition: Strong MAC Degree (SMD

m type. The affinity towards a DSP executor primarily depends on:

For each method m SMD is the ratio between the number of

m the degrees of circularity, Harvard, and MAC, the Loop Ratio,

source lines inside a loop that contain expressions of the form and the number of declared variables of DSP compatible built-in

x.

=s +s s and the total number of lines.

s 1 1 y type. The affinity towards an ASIC-like executor depends on: the

Definition: Weak MAC Degree (WMD ) Loop Ratio, the Bit Manipulation Ratio, and the number of

m

is the ratio between the number of

For each method m WMD variables of bit type. Therefore, it is possible to evaluate the

m

source lines that contain, outside a loop, expressions of the form affinity for each method m as follows:

.

s =s +s s and the total number of lines.

1 1 x y

## T T

For the concurrent memory access, the goal is to identify subsets A f W C

m m

of the specification able to exploit concurrent memory accesses to

instructions and data, as provided by the Super Harvard where:

architectures [8].

## A A A A

## GPP DSP HW

m m m

)

Definition: Strong Harvard Degree (SHD m

m

For each method m, SHD is the ratio between the number of

m

source lines that contain, inside a loop, expressions with the

indicates in general (the three average values in

while the A

HW

m m m

## C SCD WCD SHD WHD SMD WMD IOR CR LR BMR DR DR DR STR

m m m m m m m m m m m

m bit z z Table 1 are nearly the same) those procedures that exploit some

z char , string z int | real features associated with the ASIC executor class.

0 0 0 0 0 0 1 1 0 0 0 1 1 1

Table 1. Affinity average values

W 1 1 1 1 1 1 0 0 1 0 0 0 1 0

0 0 0 0 0 0 0 0 1 1 1 0 0 0 Average Average Average

DSP aplications) DSP applications)

(on the whole suite) (only the (all but the

The weights of the matrix W are set to 1 when the associated 0,46 0,38 0,49

## A

GPP 0,25 0,57 0,10

## A

metric is meaningful for a given executor class, 0 otherwise. In DSP 0,27 0,28 0,27

## A HW

this way, the affinity represents the sum of all the contributions

determined by each relevant metric. Since such a sum could be 4.1. Metrics and Partitioning

greater than one, a function should be applied to obtain values in To give the flavor on how the methodology can actually work on

the [0, 1] interval allowing a direct comparison between affinity real designs and to show its effectiveness, an example of co-

values related to different executors. analysis and system-level partitioning is here reported.

The adopted normalization function is the arctangent one because The considered application consists of 52 methods and its

/2]

it is limited to the interval [-/2, when x varies from - to Procedure Interaction Graph is represented in Figure 2. The

so. So, to normalize the affinity in the interval [0, 1] it should be target architecture is composed of an unconstrained number of

/2

scaled of a factor. Moreover, to take into account that a value GPP, DSP and FPGA. Starting from an annotated VCG (with

of 1 for a single relevant metric means a strong matching between affinity, load and communication cost data), the partitioning tool

the functionality and the executor a proper coefficient is builds its procedure-level internal model. The next action of the

multiplied to the x in order to obtain an affinity equal to 0.9 in tool has been the cost function minimization based on a genetic

correspondence with x=1. Finally, to better discriminate between algorithms strategy.

low and high affinity values, a quadratic form is introduced,

leading to the following normalization function:

2

atan x

2

f x P1 P6 P9 P12

Main P15

2 P14

T P2 P3 P7 P8 P10 P11 P13

## W C

The function f(x), when applied to , provides affinity

m P46

values that are directly comparable and therefore it can be used to P18

P4 P5 P16 P17

P28 P22 P31 P34 P37

select the best executors class for each functionality. P47

P38 P39

P35 P36

P29 P30 P23 P24 P32 P33

4. METHODOLOGY VALIDATION P43 P48 P49

P40

In order to support the presented co-analysis methodology, and to P25

P27 P26

validate the methodology itself, a tool has been developed and P52

P41 P50 P51

P42 P44 P45

integrated in the tool suite supporting the design flow of Figure 1. Figure 2. Procedure Interaction Graph

Due the wide diffusion of C language (especially in the DSP

field), a meaningful validation has been setup based on a C test Table 2 shows the affinity values of each procedure. The load

suite. A tool has been developed and integrated with a C/C++ depends on the imposed timing constraints and the relative

code analyzer (GENOA, [12]). The tool computes the affinity physical costs are 1 for a GPP, 1.2 for a DSP and 2 for a FPGA.

values for each system functionality that are then provided to the The goal of this validation is to check the behavior of the

system design exploration tools. partitioning tool for different timing constraints and different cost

The adopted benchmark suite is composed of 311 procedures; function weights in order to highlight, in particular, the role of the

each one of them representing a specific functionality. A subset of affinity values. Different timing constraints have been imposed on

these procedures (i.e. 100) has been selected from applications the execution time of the whole application in the following way:

oriented to digital signal processing and, therefore, they represent (evaluated by simulation for a single GPP

with respect to a T

## REF

a valid sample of the main functionalities involved in these system) in different experiments the constraints have been 90%

applications (e.g. Fast Fourier Transform, filtering, convolutions, , and 50% T . The latter constraint aims at forcing the

## T

## REF REF

etc.). The other procedures are representative of a general set that partitioning tool to exploit the concurrency by increasing the

contains functionalities related to the field of coding, string number of executors.

manipulation, common operations (e.g. sorting) and parts of The weight of the affinity is variable and it assumes different

videogames. During the validation process, the values of the values during several experiments in order to enforce at each step

metrics previously defined have been collected, and the affinity weights of the affinity index in the cost function. For each value

value of each functionality has been evaluated in the normalized of the affinity weight, Table 3 and Table 4 report the iteration

form. (each iteration works with a finer-granularity) that has found the

Interesting considerations can be made by analyzing the averages minimum value for the considered cost function and the related

of the affinity values on the whole test suite, for the DSP timing simulation result.

for the DSP

applications, and for the others (see Table 1). A . Such a

Table 3 shows the results for the constraint 90% T

## DSP REF

applications is fairly larger than the other affinity values and the choice enforces the presence of an architecture with more than

A values evaluated for the other application cases. It is worth one executor in order to reduce the execution time and, in fact, the

DSP has the largest average of the whole set,

noting that A timing constraint is always largely met. With lower affinity

## GPP

revealing the general purpose nature of the related executors class, weights (e.g. 0 and 2), the partitioning does not consider DSPs

### DESCRIZIONE DISPENSA

This paper considers the problem of designing heterogeneous multiprocessor embedded systems. The focus is on a step of the design flow: the definition of innovative metrics for the analysis of the system specification to statically identify the most suitable processing elements class for each system functionality. Experimental results are also included, to show the applicability and effectiveness of the proposed methodology.

I contenuti di questa pagina costituiscono rielaborazioni personali del Publisher Atreyu di informazioni apprese con la frequenza delle lezioni di Sistemi embedded e studio autonomo di eventuali libri di riferimento in preparazione dell'esame finale o della tesi. Non devono intendersi come materiale ufficiale dell'università L'Aquila - Univaq o del prof Pomante Luigi.

Acquista con carta o conto PayPal

Scarica il file tutte le volte che vuoi

Paga con un conto PayPal per usufruire della garanzia Soddisfatto o rimborsato