Che materia stai cercando?

Embedded Systems - Advanced Operating Systems - Appunti

Appunti dei corsi "Embedded Systems 1" e "Advanced Operating Systems" tenuti dal Professor William Fornaciari del Politecnico di Milano. Gli appunti sono scritti in LaTeX e comprendono tutti gli argomenti trattati nel corso, tra cui NoC, Wireless Charging, Batteries, SoC, FPGA e ASIC design flow.

Esame di Embedded Systems docente Prof. W. Fornaciari



buffer start to be used at the SA cycle because the upstream SA allocates

flits in the downstream router. The flit is physically moved away from the

buffer of the input port during the ST stage.

Credits usage.

Figure 9:

Credit-based flow control

The upstream router keeps track of the available buffer slots in the down-

stream one using credits. Whenever a flit is sent to the downstream router

a credit is used. The number of cycles a credit takes to get back is given by:

UP.ST + UP.LT + DW.BW + DW.VA + DW.SA + 1 (or more) cycle to

move back across the link + 1 cycle to be stored in the output port of the

upstream. In the next cycle the credit is ready to be used. The buffer needs

at least 7 slots because the same slot can be used after the seventh cycle.

On/Off flow control

The upstream router does not have precise information about each down-

stream buffer. The downstream router sends on and off signals to the up-

stream according to its buffers state. It is a more coarse mechanism than the

credit-base one.

Ack/Nack flow control

The delivery of each flit has to be confirmed by an acknowledgment sent to

the upstream router which keeps the sent flit until an ACK is received.

2.7.8 Router datapath

The higher the number of input ports the more complex the datapath of the

router is. The depth of the multiplexer structure increases if the technology

library supports only two inputs multiplexers. The result is worse timing

metrics because the crossbar takes more to be crossed. Moreover the struc-

ture of the SA is more complicated and has an arbiter per input port and an

arbiter per output port. The same happens to the structure of the VA.


3 [AOS] Scheduling for Uniprocessor Systems

In general the most expensive and important part of a computing system is

the Central Processing Unit. It can execute a task per core at the same time,

as far as a multi-core processor is concerned, or tasks need to be assigned to

the CPU over the time.

The scheduler has the purpose to select among the ready processes in

memory the one to be executed. Scheduling decisions may take place when

a process:

1. switches from running to waiting state;

2. switches from running to ready state;

3. switches from waiting to ready state;

4. terminates its execution.

It is important to remark that a process can change its state during the

lifetime. Running means that the process is being executed by the CPU,

waiting means that the process is waiting for an event such as an I/O one,

ready means that the process is ready to the chosen and executed. More-

over, thanks to a swapping operation, a process can reside outside the main

memory and it can be either ready or waiting.

Process state transition.

Figure 10:

3.1 Scheduler characteristics

A scheduler has two main objectives:

• it is a robust mechanism to move a process from one state to another;


• it should guarantee efficiency while handling processes.

Furthermore the scheduler can be either preemptive, if it forces the process

to release the CPU resource in order to have it allocated to another one, or

non-preemptive, if the process is allowed to run to completion. The latter

is likely to be used in a multi-core system while the last Linux kernel is

preemptive. A preemptive approach has a cost, for example if two processes

share data, they need to be coordinated. Moreover the interrupts can come

at any time and cannot be ignored. The code in the ISR (Interrupt Service

Routine) is not accessed concurrently by several processes.

3.1.1 Types of scheduling

The scheduler can be classified into three categories:

allows programs to be executed as processes in a

Long-term scheduler:

system and creates fairness to prevent deadlock and starvation;

is in charge of swapping operations and decides

Medium-term scheduler:

which processes reside in the main memory;

decides which available and ready process will be

Short-term scheduler:

executed by the processor. It is also known as dispatcher.

3.1.2 The dispatcher

The dispatcher gives the control of the CPU to the process selected by the

short-term scheduler. The dispatch latency is the time taken by the dis-

patcher to stop one process and start another one. In real-time systems it

can be bounded.

3.1.3 Scheduling criteria

keeps the CPU as busy as possible (the ideal value is

CPU utilization:

about 70%);

is the number of processes whose execution is completed


within a certain amount of time;

is the amount of time to execute a specific process;

Turnaround time:

is the amount of time a process has been waiting in the

Waiting time:

ready queue; 21

is the amount of time between the submission of a request

Response time:

and the first response, not the output;

increases the average lifetime of the system by reducing the


average temperature of its components.

In real cases is very important to reduce the variance in the response time

rather than the response time itself and minimize the average waiting time.

3.2 Scheduling algorithms

3.2.1 First-Come, First-Served

Each process in the ready queue is executed according to the arrival order

as a trivial FIFO strategy. Short processes may have to wait for a long time

before their execution, moreover this algorithm favors CPU-bound processes,

but it is not suitable for interactive ones and it is easy to implement. It suffers

from starvation and does not create fairness at all.

3.2.2 Shortest Job First

It associates with each process the length of its next CPU burst and it is

optimal giving the minimum average waiting time for specific set of processes.

Two variants exist:

• also called Shortest Process Next, assigns the CPU


to the shortest expected process of those in the ready queue. Starvation

for longer processes may occur;

• also called Shortest Remaining Time, assigns the CPU


to the process which is the nearest to its termination. This version is

slightly better than the non-preemptive one.

The length of the next burst can be estimated in the following way:

= + (1

τ αt α)τ

n+1 n


where is a prediction and t stands for a measurement. The parameter is

τ α

≤ ≤

bounded: 0 1. If is near 0, then the recent history does not count.

α α

On the other hand, if is about 1, the recent history determines entirely the


prediction. 22

3.2.3 Highest Response Ratio Next

It executes the process with the highest response ratio which can be calcu-

lated with the following formula:

expected service time

time spent waiting + expected service time

3.2.4 Round-Robin

It is a preemptive scheduling algorithm based on a clock. An amount of

time called quantum allows each process to use the CPU no more than its

length of time. When the quantum expires, a NMI interrupt is generated

by a counter and the current process is suspended and added to the end of

the ready queue. This algorithm creates no starvation because if there are n

processes in the ready queue and each process takes no more than q time to

be executed, the last process will be assigned to the CPU in (n 1)q time

units. Moreover a large q degenerates in a FIFO scheduler while a small q

may add to much overhead to the computation. As a rule of thumb, at least

the 80% of the CPU bursts should be shorter that q.

3.2.5 Other algorithms

The Feedback Scheduling penalizes those processes which have been heavily

used the CPU over the time.

The Fair-Share Scheduling takes into account that user’s applications

run as a collection of processes, therefore scheduling decisions are based on

process sets.

The Priority Scheduling associates a priority number to each process and

assigns the CPU to the process with the highest priority (lowest number).

There are bot non-preemptive and preemptive versions of this algorithm

which is really easy to implement actually. It may suffers from starvation,

but the solution to that is to increase the priority of a process accordingly

to its waiting time.

The ready queue can be partitioned into separate queues characterized

by different scheduling algorithms. For instance, a foreground queue can use

RR to improve the responsiveness while a background one FCFS.

In multi-processor systems the scheduling procedures are more complex.

Each processor can have its own ready queue and take its own decisions or

there can be a global ready queue and a load sharing. Some systems have


a master processor which accesses the data structures and allocates tasks to

the other processors.

3.2.6 Modeling

The Analytic Evaluation evaluates an algorithm with respect to a specific

workload by using a formula. The Deterministic Modeling takes a workload

and defines the performance of various algorithms for that workload. A

Queuing Model uses the distribution of process arrival time to carry out a

statistical analysis (Little’s formula: = + ).

n λ W


4 DSE and Metrics

4.1 Design metrics

The main goal of the Design Flow is to build an implementation with the

desired functionality. The real challenge is to optimize several design metrics

during the development process. Design metrics are measurable features of

a system’s implementation. They are essential to evaluate the quality of a

specific solution and capture a certain behavior of the system itself. The

evaluation is rarely relevant in absolute terms, but it is very important if

compared to another solution or a set of requirements. Moreover it allows

the designer to explore alternatives. A design model is used in order to

estimate each quality metric. Some common metrics are:

• the cost to manufacture each copy of the system;

Unit cost:

• the physical space require by the system;


• the execution time of the system;


• the system’s power consumption;


• the ability to change some system feature without huge



• the time needed to build a first working imple-


mentation of the system;

• the time required to develop the system and sell it


to customers;

• the ability to modify the system after its first re-



• the system’s compliance of requirements.


Metrics usually compete with each other.

4.1.1 Accuracy

The accuracy measures how close the estimated value of a metric is to the

actual one after the implementation of the target design.

− (D)

E(D) M


A (D)



stands for design implementation, is the estimated value of a quality

D E(D)

metric for D while (D) is the corresponding measured value. = 1 means


the estimate is perfect.

4.1.2 Fidelity

The fidelity is the percentage of correctly predicted comparisons among de-

sign implementations. Given a set of possible alternative designs, such as

{D },

= the fidelity is = 1 if:

D , D , D , ..., D µ

n ij

1 2 3 ) ) and (D ) (D ) or

E(D > E(D M > M

i j i j

) ) and (D ) (D ) or

E(D < E(D M < M

i j i j

) = ) and (D ) = (D )


i j i j

= 0 otherwise.

µ ij Moreover the fidelity percentage can be calculated as follows:

n n

2 X X



= 100 µ

F ij

− 1)

n(n i=1 j=i+1

4.1.3 Cost metrics

Both hardware and software cost metrics are used as quality metrics. The

first includes the cost of manufacturing chips, packages, testing and design

cost. The latter is associated with program and data memory size. The cost

of the software is lower than the hardware one and the development time is


4.1.4 Performance metrics

The performance metrics are split into computation and communication met-

rics. The first ones give a measure of the time required to perform a compu-


• clock cycle: it affects the execution time since each instruction requires

a specific number of cycles to be carried out;

• control steps: a control step corresponds to a single state of the machine

control unit;

• stage delay: it is the time required by any stage to perform its com-

putations. As far as pipelines are concerned, the throughput measures

how often a result is generated and it is = 1/stage delay.



• execution time: it is the total amount of time between the arrival of

data to the pipeline and the generation of the corresponding result,

execution time = number of stages * stage delay

The communication rate is a very important metric because it affects the

bus and chip design. Furthermore it sets constraints among events and it is

represented by timing diagrams.

4.1.5 Other factors affecting system design

The dissipation of power due to charging and discharging capacitors is pro-

portional to the clock frequency and the number of activity gates. Moreover

the power consumption affects the design of the battery and the reliability

of the internal components setting a limit to the clock frequency.

The testability has the goal to produce a design with minimum test cost.

In order to detect a failure, a specific input has to stress it and make it

observable through an output.

4.2 Design Space Exploration

The design space exploration is the activity of exploring design alternatives

before implementation. The goal is to optimize (maximize or minimize) dif-

ferent metrics (e.g.: power consumption, performance, lifetime...) therefore

choosing the best option among a range of possible choices.

Mathematically speaking, the optimization problem can be formulated as

follows: (x ), (x ), (x )]

max[f , ..., x f , ..., x ..., f , ..., x

n n k n

1 1 2 1 1

 ≤

(x̄) 0



 ≥

(x̄) 0


 j

subject to (x̄) = 0



 ∈

x̄ S

When k>1 and the functions are in contrast we speak about multi-objective


The automatic DSE deals with input variables, a black box and output

variables: 27

entities that define the design space. They are the free

input variables:

parameters, the quantities that the designer can vary or choices he can

make. They are continuous and discrete.

generates outputs according to the inputs. It can be a set of

black box

solvers that model the design problem.

they are measures from the system.

output variables:

The objectives are the quantities the designer wants to minimize or max-

imize while the constraints are quantities imposed to the project and define

a feasible region.

We say that a solution "a" dominates another solution "b" if the "a" is no

worse than "b" in all objectives and "a" is strictly better than "b" in at least

one objective. In this case we speak about Pareto dominance.

Pareto dominance.

Figure 11:

The design parameters are not always fixed, they have a mean value and

a standard deviation. This is why in a the best robust

robust optimization

solution is not always the best global solution due to the fact that the mean

value has to be maximized while the standard deviation has to be minimized.

Robust design.

Figure 12: 28

5 Power Management

The goal of the power management is to keep energy and power consumption

under control. This can lead to an extension of the battery life, lower tem-

peratures and more reliability. Hardware and firmware/software solutions

are possible: the first ones react faster to an increase in temperature. Re-

ducing the power consumption means reducing performance, as well as the

throughput and the computing power of the system: this is the trade off to

take into account.

Multicore systems offer a very nice solution to save power because some

cores can be switched off while they are not necessary. The same CPU with

different kinds of cooling can have different computing power due to thermal

issues. Around the maximum temperature, the probability to have a failure

is very high. The probability of failure is also bound to the thermal history

of the system (e.g.: NBTI affecting MOSFETs brings an increase in the

threshold voltage with performance degradation).

Some schedulers consider temperature and power consumption while al-

locating processes to the CPU. The goal is a multi-objective optimization of

the scheduling process.

The increase of the total power is directly proportional to the density

of the components. Moreover leakage power (static power) is becoming a

relevant factor that affects the power consumption: when the component is

doing nothing, there is a leak of energy anyway.

Different requirements concerning the final product determine the power

consumption: the request of high performance, low power, but also a con-

junction of the two. The energy is the integral of power over time.

In the past the approach was one design for each product. Now devices

are combined in families whose design is retargetable. This leads to a smaller

time-to-market and a higher reusability and flexibility of the design. Power

saving opportunities are maximized at system and behavioral level (changes

in the ISA, algorithm design, pipelining...).

5.1 Power Consumption in CMOS Devices

The power consumption of one device depends mainly on dynamic and static

power. The dynamic power is the power consumed when the device is active

and the signals are changing values. In particular:

• the switching power is the power required to charge and discharge out-

put capacitance of a gate; 29

• the internal power is a short circuit current that occurs when both

NMOS and PMOS of a gate are switched on.

The static power is the power consumed when the device is not switching the

values of signals. Since the dynamic power is dominated by the switching

power, the following formula can be used: 2

· ·


P C V f

dyn ef f clock

dd ·

The effective capacitance is equal to = , where is the

C f f C P C

e l trans l

output capacitance and is the probability of output transition. In

P trans

order to reduce the dynamic power, we can reduce the switching activity,

the clock frequency or the power supply voltage as the formula suggests.

The problem is that if the is reduced, the device becomes slower and to



maintain good performances the of the transistors should be decreased.



The result is an increase of the leakage power. Moreover reducing the is



no more a practical solution because the voltage reached a kind of plateau

of 1V.

5.2 Power Reduction Approaches

Frequency and voltage techniques are used to reduce the power consumption.

5.2.1 Clock gating

The clock is transmitted to the component only when it is necessary. It can

be blocked by a gate and different clocks can be used inside a single system.

In this case, the system is divided in subsystems and each part is fed with a

specific clock frequency (clock domains).

5.2.2 Multi Vdd 2

Since the dynamic power is proportional to , a reduction can bring decent

V dd

power saving. Each logic block of the system is supplied with a precise

voltage generated by a multi-Vdd power supply. Moreover a can be used

V dd

for normal operations and a for low-power operations keeping logic and



memory in a retention state.

5.2.3 Power gating

Power gating is designed to reduce both leakage and dynamic power, but

mainly the first one. Certain blocks of the system are completely powered

down thanks to a power gating controller that communicates with a power


switching fabric connected to some power gated functional blocks. It is more

invasive than clock gating. The trade off to take into account is that the

technique brings a huge power saving, but introduces significant time delays.

The power gating is broadly used in multicore microprocessors.

5.2.4 Power management techniques the voltage and the frequency

Dynamic voltage and frequency scaling:

are dynamically adjusted saving power and reaching an operating per-

formance point, that is a couple of voltage and frequency that are com-

patible. There are different approaches: first of all a variable amount

of energy is allocated to perform a task, then the system idle time can

be maximized (a task is executed and completed as fast as possible) or

minimized (the lowest operating performance point is selected reducing

static and dynamic power).

determines when a device has completed its

Dynamic power switching:

computations and if it is not needed anymore, an automatic switch to

low-power mode occurs reducing leakage power.

while in the DPS only some sections of

Stand-by leakage management:

the device are put in low-power mode, in the SLM the entire device

enters the low-power state. The device state is saved into an exter-

nal memory making the wakeup time faster than a cold boot. The

events to exit the low-power mode are user-related (key pressed) and

not application-related (timer, DMA...) as in the DPS. Moreover in

the DPS the allowed latency for mode transitions depends on time

constraints of the application, while in the SLM depends on the user

sensitivity. the power supply can be statically and dynam-

Adaptive voltage scaling:

ically adapted to silicon performance.

These power management techniques can be combined to increase the power


5.2.5 Linux frameworks for PM

Several Linux frameworks are in charge of supporting the power management.

The Linux time system allows to reschedule non-critical timeouts that can

be run when the processor wakes up for other reasons. The clock framework

gives the possibility to switch off system parts for example turning off a pll,


but each part has clock dependencies. The driver keeps under control the de-

pendencies. A voltage framework concerns voltage regulation and optimizes

the usage and efficiency of the regulators. The latency framework allows

to determine the admissible system latency values. CPUIdle and CPUfreq

respectively optimize the static power consumption and the dynamic power

consumption. Moreover the latter supports DVFS. The QoS framework pro-

vides a communication strategy between drivers and applications.

5.2.6 ACPI

The Advanced Configuration and Power Interface specification provides a

standard that can be exploited by operating systems to perform power man-

agement with respect to hardware components. The BIOS is no more exclu-

sively in charge of the power management, but loads ACPI tables in system

memory. Firmware ACPI functionalities are provided by AML bytecode,

stored in ACPI tables, that must be interpreted by the operating system.

The ACPI standard defines four global system states (G0-G3, working -

mechanical off), five sleep states (S1-S5) concerning the global states G1 and

G2 and four processor states (C0-C3).


6 Communication

Communication means exchanging information from one system to another.

The main factors that affect the communication are the distance among the

systems, the physical characteristics of the channel where information passes

through, the speed of the communication itself. We deal with error prevention

and error detection and correction

6.1 Serial communication

The serial communication is the easiest way to exchange information. It

consists of the transmission of one bit at a time on just a single logic commu-

nication line between the systems. Moreover the serial communication can

be synchronous or asynchronous. In the first case the clocks of the transmit-

ting and receiving devices are synchronized allowing the absence of a start

and a stop bit. In the second case these bits allow the communication among

devices without a common clock signal.

6.2 Standard serial protocols and buses


6.2.1 I C


The I C synchronous serial protocol allows the communication among devices

within a limited distance. It makes use of two lines: the devices are connected

to SDA (Serial Data Line) and SCL (Serial Clock Line). Moreover a device

can be master or slave, but only masters are able to start a transaction. The

communication is totally asynchronous and each packet is delimited by a

start condition and a stop condition and has the following property:

• the start condition occurs whenever the SCL is 1 and SDA changes

from 1 to 0 followed by a transition of SCL to 0.

• the stop condition occurs whenever the SCL is 1 and SDA changes from

0 to 1.

• the internal structure is made of the address of the slave device, the

type of the operation (R/W), an ACK signal for the correctness of the

address, the data byte and another ACK signal generated by the slave.

SDA and SCL are open-drain lines and require pull-up resistors.


6.2.2 SPI

The SPI protocol is synchronous and only one device can fulfill the role of the

master. The protocol makes use of four special lines: SCLK (Serial Clock),

MOSI (Master Output Slave Input), MISO (Master Input Slave Output)

and SS (Slave Select, one per each slave). All MISO signals are wired-or

connected on a single line therefore whenever a slave device is not active (SS

high), its MISO output must be in high impedance mode.

The communication starts when the master activates one slave changing

the status of the related SS line from high to low. Then the master generates a

clock signal on the SCLK line and begin a full-duplex communication thanks

to MISO and MOSI lines.

A daisy-chain connection is also allowed among master and slaves. In

this case there is a single SS line for all the slaves and they are connected in

sequence: the first slave reads data on MOSI and propagates them on MISO

connected to the MOSI port of the next slave device.

6.2.3 Protocols analysis 2

SPI has a better throughput compared to I C because of the full-duplex

communication and the lack of addressing. On the other hand, the SPI

protocol can support only one master and does not provide an acknowledge


mechanism unlike the I C protocol.

6.2.4 CAN bus

The CAN bus is based on a differential serial communication protocol that

allows the exchange of small data packets called frames. It works in asyn-

chronous mode and does not have an explicit arbitration mechanism thanks

to the implementation of a priority one: a high priority message wins the bus

and can use it while a lower priority message detects that the bus is busy

and waits. The structure of any bit is peculiar: first of all there is a syn-

chronization segment with a fixed duration, the propagation segment takes

care of delays introduced by physical bus lines and, at the end, two variable

length segments keep the clocks synchronized.

6.3 Parallel communication

The parallel communication uses several lines to carry data and it is more

expensive than the serial solution. The speed and the overall performance

is better than the serial communication, but electromagnetic interference


among the lines has to be taken into account. Moreover the design of parallel

interfaces is much more complex.

The group of lines used for the communication is often called "bus" and

the lines are divided into data bus, address bus and control bus.

6.3.1 Parallel protocols

The strobe protocol consists of a request control line. As soon as the master

asserts it, the slave has a predefined interval of time to put the data on the

bus, then the master read this data and deassert the request line ending the

communication. The assumption is that the slave is always able to provide

data in time.

The handshake protocol works in a similar way, but provides an extra

acknowledgment line to make the master aware of the data availability after

the assertion of the request line by the master itself. The disadvantage is

that the transaction takes more time.

A mixed protocol is also possible. The strobe/handshake uses a wait line

instead of an acknowledgement one, that can be asserted by the slave in case

of data unavailability.

6.4 Wireless communication

The wireless communication is also possible, but it is as not reliable as cable

communications. WiFi has a great speed, but the disadvantage is the power

consumption and the coverage. Bluetooth has a very good power consump-

tion if the low energy variant is used, but it is suitable only for short-range

communications. GSM provides another communication method that is good

as far as coverage is concerned, but costs and power consumption are very

high. 35

7 [AOS] Interfacing

While the word communication is used when two or more systems exchange

information, the word interfacing considers the data transfer among compo-

nents of the same system. Usually the system is composed by a microproces-

sor, a memory and some peripherals and data exchange is performed thanks

to a few techniques such as polling, interrupts and DMA.

7.1 Addressing

The microprocessor has to specify the device, memory or peripheral, to talk

to. In order to do that there are different solutions:

the address space is divided into pieces and

Memory mapped I/O:

each one is assigned to a specific device. In this case the system only

needs one shared bus among the memory and the peripherals. All the

different devices can be mapped on non-contiguous addresses.

this solution benefits from an additional line to the

Standard I/O:

control bus in order to specify if the address refers to a peripheral or

to a memory location. The advantage is that the I/O devices do not

reduce the address space available for the memory.

the bus is used only by the memory while the

Port mapped I/O:

other devices communicate directly with the microprocessor through

dedicated lines. The disadvantage is the poor scalability of this mech-

anism, on the other hand the address space reserved for the memory is

not reduced, exactly as in the standard I/O way.

The following ones are solutions used to figure out when to access a device

(for instance, a data is ready to be read).

7.1.1 Polling

First of all the status of the device must be continuously checked out by the

microprocessor by reading specific registers. This solution is characterized

by some disadvantages: the microprocessor has to read registers of the de-

vice even if data is not ready wasting its computing power by doing useless

operations. Moreover the ready data rate of the devices can be very different

and this can affect the complexity of the polling routines.


7.1.2 Interrupt

In order to avoid the waste of computing power, the interrupt mechanism

can be implemented. The interrupt is a signal sent by the peripherals to the

microprocessor pointing out that available data has to be read. The com-

munication is completely asynchronous. This requires at least one interrupt

request line often called INT. At the end of each assembly instruction, the

microprocessor checks the value of the interrupt line: if it is active, the mi-

croprocessor halts the current program execution and executes the interrupt

service routine to read the available data from the requesting peripheral.

After that, the execution of the program resumes.

The interrupt mechanism can be fulfilled in many ways: each peripheral

can have its own interrupt line or one single line can be shared among the

peripherals. In this case when a peripheral makes a request, the others are

inhibited while the microprocessor carries out the request itself. Moreover

the peripherals can be connected to each other and they are crossed by the

interrupt line. This has the advantage to establish a simple priority mecha-


One of the main issues is the association of the interrupt signal to the

correct peripheral such that the right routine is executed. The address of the

interrupt service routine can be fixed, therefore when the interrupts occur

the microprocessors accesses a predefined memory area. The limitations of

this solutions are worked out by the vectored interrupt mechanism: whenever

an interrupt occurs, the peripheral writes on the bus the address related to

its own service routine. The microprocessor now knows the location of the

ISR and jumps to that location to executes it.

Sometimes it may be needed to disable completely interrupt requests, for

example during the execution of some critical sections of code. In this case

interrupts can be masked: the easiest way is to use the assembly instruction

DI (disable interrupts). There are some interrupt requests that must be

always carried out: those requests are called NMI (non-maskable interrupt).

7.1.3 DMA

The DMA controller has the capability to exchange data between peripherals

and memory without affecting the computing power of the microprocessor.

The peripheral is directly connected to the DMA controller and not to the

microprocessor. When an interrupt is generated, the DMA controller re-

quests the bus to the processor, that replies with an ACK signal. The main

disadvantage is that for all the duration of the data transfer, the bus cannot

be exploited by the microprocessor. Moreover the DMA controller needs the


address of the peripheral, the address of memory where to write data and

the quantity of data that has to be transferred.


8 Hardware Technologies

The hardware technologies allow to realize analog and digital circuits at

the base of any embedded system. The designer can choose among several

hardware solutions:

-Commercial Off The Shelf - components produced in large volumes


that are available on the market and can be used in our system. They

are directly mounted on the board and assure a small time-to-market,

but they are not always optimized in terms of power consumption and

performance. if the designer needs to introduce complex functions that


are not offered by standard components, microprocessors are the right

choice. The software is in charge of providing the specific function.

this kind of solution offers a huge amount of logic,

Programmable logic:

storage and communication resources. The designer has to configure

the device that is very flexible and more efficient than the only software


they are integrated circuits developed for one specific purpose only.


Their performance and optimized power consumption are not com-

parable to the previous solution taken into account. In order to be

affordable, the production volume must be very high.

Large-scale projects might take the advantage of hardware blocks that have

been already developed. As far as Intellectual Property (IP) is concerned,

the reusability of the design can be classified in two ways. The Hard Macro

IP cannot be modified by the designer keeping the original specifications of

the block and avoiding reverse engineering. Only the external interfaces are

specified. The Soft Macro IP provides a thorough description (possibly a

VHDL or Verilog one) of the block that can be easily integrated in our de-


Hardware design flow

The hardware design flow has the purpose to convert a synthesizable speci-

fication into a digital circuit. The flow is divided in two parts: the front-end

and the back-end. The front-end starts with a HDL description of the hard-

ware blocks. Each block is verified using a specific test-bench, optimized and

technologically mapped. The back-end deals with the floorplanning, that is

the repartition of the silicon area associated to each functional block, then

the layout of the circuit is built by the placement and routing phases defining

the geometry to manufacture the required masks. As far as programmable


logic devices are concerned, the back-end is extremely simplified because cells

and interconnections are already present on the device, which only needs to

be programmed.

8.1 ASIC Prototyping Technologies

8.1.1 The planar process

The planar process is at the base of the ASIC technology. Silicon is used as

semiconductor while interconnections among components are made of copper.

The process allows to obtain the following elements on the starting pure

silicon: p-type Silicon, n-type Silicon, insulator, to isolate the active zones

of p and n Silicon (SiO is commonly used), conductor. The phases of the


planar process are the following:

1. In order to make the starting Silicon layer very pure, the impurities are

ruled out using a hot coil that has the purpose to collect the impurities

themselves in a specific area only. After that this area is removed.

2. Starting from the wafer, a circular Silicon layer, the deposition of the

photoresist takes place.

3. The wafer is exposed to UV light in order to make specific zones of

the photoresist strong enough to resist to chemical agents. This is the

masking phase.

4. In the washing phase, those parts of the photoresist that have not been

exposed to the UV light are removed.

5. p and n-type Silicon regions are created and a new process of photoresist

deposition, masking and washing is carried out.

6. As soon as all the active zones are ready, the wafer is covered by a layer

of Silicon dioxide, that is later masked and removed.

7. Lastly a thin layer of conductive material (Aluminum) is used to create

contacts on the active parts of our device.

The manufacturing of an ASIC using the planar process can require a number

of masks that is in the order of 25-30. This is why making ASICs is extremely

expensive. 40

8.1.2 Standard cell and gate array technologies

The following technologies offer a simplification to the ASIC manufacturing

process. The standard cell technology limits the degrees of freedom offered

by the planar process in order to dominate the prototyping complexity of

digital circuits. The back-end phase of the project is simplified standardizing

some aspects. A cell is a small digital circuit that has been already designed

and optimized. It can fulfill different roles such as logic gate, sequential

component, memory... The structure of a standard cell chip is characterized

by a pad ring, rows that implement the functional logic and channels that

contain the interconnection lines.

The gate array technology simplifies the ASIC manufacturing process

starting from a partially available chip that provides groups of p and n tran-

sistors. Moreover at the top and at the bottom of the gate array a Vdd rail

and a Vss rail can be found. The peculiarity of this technology is that only

the interconnections among the transistors must be described to get to the

final product.

8.1.3 Full custom

The full custom technology offers the designer the maximum flexibility and

degree of freedom in prototyping and developing digital circuits. Moreover

the designer is not obliged to follow any geometrical constraint regarding the


8.2 Programmable Technologies

The prototyping activity of the designer can be further simplified by the use of

programmable logic devices that integrate logic resources and programmable

interconnections. Unlikely the ASIC, any PLD has only to be programmed

in order to provide the desired functionalities and does not require the chip

manufacturing. Any PLD has a specific programming mode:

the OTP devices can be programmed one time

One-time programmable:

only. They implement a fuse or antifuse technology based on the mod-

ification of the internal physical structure of the device.

the device can be configured multiple times. The cost


is higher than the one of an OTP device and the devices implement a

volatile or non-volatile memory (EEPROM).

the device can be dynamically and selectively programmed


even if it is working and is in use (SRAM, Flash memory).


Two remarkable factors that characterized any programmable logic device

are the organization of the cells and the structure of the interconnections.

The cells are often configured following a very regular matrix-like structure.

Moreover highly cell-populated zones can be alternated to interconnection

zones or the distribution of cells and interconnections can be uniform on all

the chip surface (FPGA). Furthermore the interconnections strongly deter-

mine the performance and the latency of a PLD. A global connection crosses

the whole chip while a local connection deals with a small number of cells.

8.2.1 FPGA

The FPGA is the evolution of some old programmable devices such as PAL,

PLA and GAL. It is characterized by a distributed and matrix-like structure

of the logic resources. In the middle, block-RAM can be typically found

while the cells on the edge are in charge of I/O operations. In the corners

there can be usually found some blocks dedicated to specific functions such

as CLK generators, digital PLLs and modules devoted to the programming

of the device. The FPGA offers the highest flexible and powerful choice if

the production volume is not so huge, it makes the prototyping phase faster

and has a nice cost-performance relationship. The product design using a

FPGA is almost uniquely front-end related.

8.3 Feasibility Study

The main purpose of the feasibility study is to analyze and evaluate the ad-

vantage of a proposed solution. It can deal with an existing product that

is experiencing a technological evolution or a new one that has to be de-

signed from scratch. The main aspects of the study are the elicitation of the

characteristics of the system-to-be, the selection of the suppliers and a deep

analysis of the costs related to the proposed solution, also with respect to

the possibly existing one.

The functional requirements and constraints of the final product may de-

termine the architecture that will be used, for example the choice of ASIC

instead of COTS and programmable logic such as FPGA. In this case the

possible benefits will be an easier PCB routing and the reduction of elec-

tromagnetic interference working out pin-out problems. The front-end and

the back-end of the design flow can be carried out by different entities: a

design center and a silicon foundry. They both determine significant costs.

The front-end related costs mainly deal with HDL code development and

analysis, testing and simulation while the back-end costs are mostly related

to IP blocks, the production of the masks the manufacturing process.



1 volte




1.11 MB


+1 anno fa

Corso di laurea: Corso di laurea in ingegneria informatica (COMO - CREMONA - MILANO)
A.A.: 2017-2018

I contenuti di questa pagina costituiscono rielaborazioni personali del Publisher hardware994 di informazioni apprese con la frequenza delle lezioni di Embedded Systems e studio autonomo di eventuali libri di riferimento in preparazione dell'esame finale o della tesi. Non devono intendersi come materiale ufficiale dell'università Politecnico di Milano - Polimi o del prof Fornaciari William.

Acquista con carta o conto PayPal

Scarica il file tutte le volte che vuoi

Paga con un conto PayPal per usufruire della garanzia Soddisfatto o rimborsato

Ti è piaciuto questo appunto? Valutalo!

Altri appunti di Corso di laurea in ingegneria informatica (como - cremona - milano)

Robotica - Elementi introduttivi
Controllo di posizione e velocità
Formulario Elettrotecnica
Appunti completi corso Fondamenti di Automatica