Embedded Systems - Advanced Operating Systems - Appunti
buffer start to be used at the SA cycle because the upstream SA allocates
flits in the downstream router. The flit is physically moved away from the
buffer of the input port during the ST stage.
Credit-based flow control
The upstream router keeps track of the available buffer slots in the down-
stream one using credits. Whenever a flit is sent to the downstream router
a credit is used. The number of cycles a credit takes to get back is given by:
UP.ST + UP.LT + DW.BW + DW.VA + DW.SA + 1 (or more) cycle to
move back across the link + 1 cycle to be stored in the output port of the
upstream. In the next cycle the credit is ready to be used. The buffer needs
at least 7 slots because the same slot can be used after the seventh cycle.
On/Off flow control
The upstream router does not have precise information about each down-
stream buffer. The downstream router sends on and off signals to the up-
stream according to its buffers state. It is a more coarse mechanism than the
Ack/Nack flow control
The delivery of each flit has to be confirmed by an acknowledgment sent to
the upstream router which keeps the sent flit until an ACK is received.
2.7.8 Router datapath
The higher the number of input ports the more complex the datapath of the
router is. The depth of the multiplexer structure increases if the technology
library supports only two inputs multiplexers. The result is worse timing
metrics because the crossbar takes more to be crossed. Moreover the struc-
ture of the SA is more complicated and has an arbiter per input port and an
arbiter per output port. The same happens to the structure of the VA.
3 [AOS] Scheduling for Uniprocessor Systems
In general the most expensive and important part of a computing system is
the Central Processing Unit. It can execute a task per core at the same time,
as far as a multi-core processor is concerned, or tasks need to be assigned to
the CPU over the time.
The scheduler has the purpose to select among the ready processes in
memory the one to be executed. Scheduling decisions may take place when
1. switches from running to waiting state;
2. switches from running to ready state;
3. switches from waiting to ready state;
4. terminates its execution.
It is important to remark that a process can change its state during the
lifetime. Running means that the process is being executed by the CPU,
waiting means that the process is waiting for an event such as an I/O one,
ready means that the process is ready to the chosen and executed. More-
over, thanks to a swapping operation, a process can reside outside the main
memory and it can be either ready or waiting.
Process state transition.
3.1 Scheduler characteristics
A scheduler has two main objectives:
• it is a robust mechanism to move a process from one state to another;
• it should guarantee efficiency while handling processes.
Furthermore the scheduler can be either preemptive, if it forces the process
to release the CPU resource in order to have it allocated to another one, or
non-preemptive, if the process is allowed to run to completion. The latter
is likely to be used in a multi-core system while the last Linux kernel is
preemptive. A preemptive approach has a cost, for example if two processes
share data, they need to be coordinated. Moreover the interrupts can come
at any time and cannot be ignored. The code in the ISR (Interrupt Service
Routine) is not accessed concurrently by several processes.
3.1.1 Types of scheduling
The scheduler can be classified into three categories:
allows programs to be executed as processes in a
system and creates fairness to prevent deadlock and starvation;
is in charge of swapping operations and decides
which processes reside in the main memory;
decides which available and ready process will be
executed by the processor. It is also known as dispatcher.
3.1.2 The dispatcher
The dispatcher gives the control of the CPU to the process selected by the
short-term scheduler. The dispatch latency is the time taken by the dis-
patcher to stop one process and start another one. In real-time systems it
can be bounded.
3.1.3 Scheduling criteria
keeps the CPU as busy as possible (the ideal value is
is the number of processes whose execution is completed
within a certain amount of time;
is the amount of time to execute a specific process;
is the amount of time a process has been waiting in the
ready queue; 21
is the amount of time between the submission of a request
and the first response, not the output;
increases the average lifetime of the system by reducing the
average temperature of its components.
In real cases is very important to reduce the variance in the response time
rather than the response time itself and minimize the average waiting time.
3.2 Scheduling algorithms
3.2.1 First-Come, First-Served
Each process in the ready queue is executed according to the arrival order
as a trivial FIFO strategy. Short processes may have to wait for a long time
before their execution, moreover this algorithm favors CPU-bound processes,
but it is not suitable for interactive ones and it is easy to implement. It suffers
from starvation and does not create fairness at all.
3.2.2 Shortest Job First
It associates with each process the length of its next CPU burst and it is
optimal giving the minimum average waiting time for specific set of processes.
Two variants exist:
• also called Shortest Process Next, assigns the CPU
to the shortest expected process of those in the ready queue. Starvation
for longer processes may occur;
• also called Shortest Remaining Time, assigns the CPU
to the process which is the nearest to its termination. This version is
slightly better than the non-preemptive one.
The length of the next burst can be estimated in the following way:
= + (1
τ αt α)τ
where is a prediction and t stands for a measurement. The parameter is
bounded: 0 1. If is near 0, then the recent history does not count.
On the other hand, if is about 1, the recent history determines entirely the
3.2.3 Highest Response Ratio Next
It executes the process with the highest response ratio which can be calcu-
lated with the following formula:
expected service time
time spent waiting + expected service time
It is a preemptive scheduling algorithm based on a clock. An amount of
time called quantum allows each process to use the CPU no more than its
length of time. When the quantum expires, a NMI interrupt is generated
by a counter and the current process is suspended and added to the end of
the ready queue. This algorithm creates no starvation because if there are n
processes in the ready queue and each process takes no more than q time to
be executed, the last process will be assigned to the CPU in (n 1)q time
units. Moreover a large q degenerates in a FIFO scheduler while a small q
may add to much overhead to the computation. As a rule of thumb, at least
the 80% of the CPU bursts should be shorter that q.
3.2.5 Other algorithms
The Feedback Scheduling penalizes those processes which have been heavily
used the CPU over the time.
The Fair-Share Scheduling takes into account that user’s applications
run as a collection of processes, therefore scheduling decisions are based on
The Priority Scheduling associates a priority number to each process and
assigns the CPU to the process with the highest priority (lowest number).
There are bot non-preemptive and preemptive versions of this algorithm
which is really easy to implement actually. It may suffers from starvation,
but the solution to that is to increase the priority of a process accordingly
to its waiting time.
The ready queue can be partitioned into separate queues characterized
by different scheduling algorithms. For instance, a foreground queue can use
RR to improve the responsiveness while a background one FCFS.
In multi-processor systems the scheduling procedures are more complex.
Each processor can have its own ready queue and take its own decisions or
there can be a global ready queue and a load sharing. Some systems have
a master processor which accesses the data structures and allocates tasks to
the other processors.
The Analytic Evaluation evaluates an algorithm with respect to a specific
workload by using a formula. The Deterministic Modeling takes a workload
and defines the performance of various algorithms for that workload. A
Queuing Model uses the distribution of process arrival time to carry out a
statistical analysis (Little’s formula: = + ).
n λ W
4 DSE and Metrics
4.1 Design metrics
The main goal of the Design Flow is to build an implementation with the
desired functionality. The real challenge is to optimize several design metrics
during the development process. Design metrics are measurable features of
a system’s implementation. They are essential to evaluate the quality of a
specific solution and capture a certain behavior of the system itself. The
evaluation is rarely relevant in absolute terms, but it is very important if
compared to another solution or a set of requirements. Moreover it allows
the designer to explore alternatives. A design model is used in order to
estimate each quality metric. Some common metrics are:
• the cost to manufacture each copy of the system;
• the physical space require by the system;
• the execution time of the system;
• the system’s power consumption;
• the ability to change some system feature without huge
• the time needed to build a first working imple-
mentation of the system;
• the time required to develop the system and sell it
• the ability to modify the system after its first re-
• the system’s compliance of requirements.
Metrics usually compete with each other.
The accuracy measures how close the estimated value of a metric is to the
actual one after the implementation of the target design.
stands for design implementation, is the estimated value of a quality
metric for D while (D) is the corresponding measured value. = 1 means
the estimate is perfect.
The fidelity is the percentage of correctly predicted comparisons among de-
sign implementations. Given a set of possible alternative designs, such as
= the fidelity is = 1 if:
D , D , D , ..., D µ
1 2 3 ) ) and (D ) (D ) or
E(D > E(D M > M
i j i j
) ) and (D ) (D ) or
E(D < E(D M < M
i j i j
) = ) and (D ) = (D )
E(D E(D M M
i j i j
= 0 otherwise.
µ ij Moreover the fidelity percentage can be calculated as follows:
2 X X
= 100 µ
n(n i=1 j=i+1
4.1.3 Cost metrics
Both hardware and software cost metrics are used as quality metrics. The
first includes the cost of manufacturing chips, packages, testing and design
cost. The latter is associated with program and data memory size. The cost
of the software is lower than the hardware one and the development time is
4.1.4 Performance metrics
The performance metrics are split into computation and communication met-
rics. The first ones give a measure of the time required to perform a compu-
• clock cycle: it affects the execution time since each instruction requires
a specific number of cycles to be carried out;
• control steps: a control step corresponds to a single state of the machine
• stage delay: it is the time required by any stage to perform its com-
putations. As far as pipelines are concerned, the throughput measures
how often a result is generated and it is = 1/stage delay.
• execution time: it is the total amount of time between the arrival of
data to the pipeline and the generation of the corresponding result,
execution time = number of stages * stage delay
The communication rate is a very important metric because it affects the
bus and chip design. Furthermore it sets constraints among events and it is
represented by timing diagrams.
4.1.5 Other factors affecting system design
The dissipation of power due to charging and discharging capacitors is pro-
portional to the clock frequency and the number of activity gates. Moreover
the power consumption affects the design of the battery and the reliability
of the internal components setting a limit to the clock frequency.
The testability has the goal to produce a design with minimum test cost.
In order to detect a failure, a specific input has to stress it and make it
observable through an output.
4.2 Design Space Exploration
The design space exploration is the activity of exploring design alternatives
before implementation. The goal is to optimize (maximize or minimize) dif-
ferent metrics (e.g.: power consumption, performance, lifetime...) therefore
choosing the best option among a range of possible choices.
Mathematically speaking, the optimization problem can be formulated as
follows: (x ), (x ), (x )]
max[f , ..., x f , ..., x ..., f , ..., x
n n k n
1 1 2 1 1
subject to (x̄) = 0
When k>1 and the functions are in contrast we speak about multi-objective
The automatic DSE deals with input variables, a black box and output
entities that define the design space. They are the free
parameters, the quantities that the designer can vary or choices he can
make. They are continuous and discrete.
generates outputs according to the inputs. It can be a set of
solvers that model the design problem.
they are measures from the system.
The objectives are the quantities the designer wants to minimize or max-
imize while the constraints are quantities imposed to the project and define
a feasible region.
We say that a solution "a" dominates another solution "b" if the "a" is no
worse than "b" in all objectives and "a" is strictly better than "b" in at least
one objective. In this case we speak about Pareto dominance.
The design parameters are not always fixed, they have a mean value and
a standard deviation. This is why in a the best robust
solution is not always the best global solution due to the fact that the mean
value has to be maximized while the standard deviation has to be minimized.
Figure 12: 28
5 Power Management
The goal of the power management is to keep energy and power consumption
under control. This can lead to an extension of the battery life, lower tem-
peratures and more reliability. Hardware and firmware/software solutions
are possible: the first ones react faster to an increase in temperature. Re-
ducing the power consumption means reducing performance, as well as the
throughput and the computing power of the system: this is the trade off to
take into account.
Multicore systems offer a very nice solution to save power because some
cores can be switched off while they are not necessary. The same CPU with
different kinds of cooling can have different computing power due to thermal
issues. Around the maximum temperature, the probability to have a failure
is very high. The probability of failure is also bound to the thermal history
of the system (e.g.: NBTI affecting MOSFETs brings an increase in the
threshold voltage with performance degradation).
Some schedulers consider temperature and power consumption while al-
locating processes to the CPU. The goal is a multi-objective optimization of
the scheduling process.
The increase of the total power is directly proportional to the density
of the components. Moreover leakage power (static power) is becoming a
relevant factor that affects the power consumption: when the component is
doing nothing, there is a leak of energy anyway.
Different requirements concerning the final product determine the power
consumption: the request of high performance, low power, but also a con-
junction of the two. The energy is the integral of power over time.
In the past the approach was one design for each product. Now devices
are combined in families whose design is retargetable. This leads to a smaller
time-to-market and a higher reusability and flexibility of the design. Power
saving opportunities are maximized at system and behavioral level (changes
in the ISA, algorithm design, pipelining...).
5.1 Power Consumption in CMOS Devices
The power consumption of one device depends mainly on dynamic and static
power. The dynamic power is the power consumed when the device is active
and the signals are changing values. In particular:
• the switching power is the power required to charge and discharge out-
put capacitance of a gate; 29
• the internal power is a short circuit current that occurs when both
NMOS and PMOS of a gate are switched on.
The static power is the power consumed when the device is not switching the
values of signals. Since the dynamic power is dominated by the switching
power, the following formula can be used: 2
P C V f
dyn ef f clock
The effective capacitance is equal to = , where is the
C f f C P C
e l trans l
output capacitance and is the probability of output transition. In
order to reduce the dynamic power, we can reduce the switching activity,
the clock frequency or the power supply voltage as the formula suggests.
The problem is that if the is reduced, the device becomes slower and to
maintain good performances the of the transistors should be decreased.
The result is an increase of the leakage power. Moreover reducing the is
no more a practical solution because the voltage reached a kind of plateau
5.2 Power Reduction Approaches
Frequency and voltage techniques are used to reduce the power consumption.
5.2.1 Clock gating
The clock is transmitted to the component only when it is necessary. It can
be blocked by a gate and different clocks can be used inside a single system.
In this case, the system is divided in subsystems and each part is fed with a
specific clock frequency (clock domains).
5.2.2 Multi Vdd 2
Since the dynamic power is proportional to , a reduction can bring decent
power saving. Each logic block of the system is supplied with a precise
voltage generated by a multi-Vdd power supply. Moreover a can be used
for normal operations and a for low-power operations keeping logic and
memory in a retention state.
5.2.3 Power gating
Power gating is designed to reduce both leakage and dynamic power, but
mainly the first one. Certain blocks of the system are completely powered
down thanks to a power gating controller that communicates with a power
switching fabric connected to some power gated functional blocks. It is more
invasive than clock gating. The trade off to take into account is that the
technique brings a huge power saving, but introduces significant time delays.
The power gating is broadly used in multicore microprocessors.
5.2.4 Power management techniques the voltage and the frequency
Dynamic voltage and frequency scaling:
are dynamically adjusted saving power and reaching an operating per-
formance point, that is a couple of voltage and frequency that are com-
patible. There are different approaches: first of all a variable amount
of energy is allocated to perform a task, then the system idle time can
be maximized (a task is executed and completed as fast as possible) or
minimized (the lowest operating performance point is selected reducing
static and dynamic power).
determines when a device has completed its
Dynamic power switching:
computations and if it is not needed anymore, an automatic switch to
low-power mode occurs reducing leakage power.
while in the DPS only some sections of
Stand-by leakage management:
the device are put in low-power mode, in the SLM the entire device
enters the low-power state. The device state is saved into an exter-
nal memory making the wakeup time faster than a cold boot. The
events to exit the low-power mode are user-related (key pressed) and
not application-related (timer, DMA...) as in the DPS. Moreover in
the DPS the allowed latency for mode transitions depends on time
constraints of the application, while in the SLM depends on the user
sensitivity. the power supply can be statically and dynam-
Adaptive voltage scaling:
ically adapted to silicon performance.
These power management techniques can be combined to increase the power
5.2.5 Linux frameworks for PM
Several Linux frameworks are in charge of supporting the power management.
The Linux time system allows to reschedule non-critical timeouts that can
be run when the processor wakes up for other reasons. The clock framework
gives the possibility to switch off system parts for example turning off a pll,
but each part has clock dependencies. The driver keeps under control the de-
pendencies. A voltage framework concerns voltage regulation and optimizes
the usage and efficiency of the regulators. The latency framework allows
to determine the admissible system latency values. CPUIdle and CPUfreq
respectively optimize the static power consumption and the dynamic power
consumption. Moreover the latter supports DVFS. The QoS framework pro-
vides a communication strategy between drivers and applications.
The Advanced Configuration and Power Interface specification provides a
standard that can be exploited by operating systems to perform power man-
agement with respect to hardware components. The BIOS is no more exclu-
sively in charge of the power management, but loads ACPI tables in system
memory. Firmware ACPI functionalities are provided by AML bytecode,
stored in ACPI tables, that must be interpreted by the operating system.
The ACPI standard defines four global system states (G0-G3, working -
mechanical off), five sleep states (S1-S5) concerning the global states G1 and
G2 and four processor states (C0-C3).
Communication means exchanging information from one system to another.
The main factors that affect the communication are the distance among the
systems, the physical characteristics of the channel where information passes
through, the speed of the communication itself. We deal with error prevention
and error detection and correction
6.1 Serial communication
The serial communication is the easiest way to exchange information. It
consists of the transmission of one bit at a time on just a single logic commu-
nication line between the systems. Moreover the serial communication can
be synchronous or asynchronous. In the first case the clocks of the transmit-
ting and receiving devices are synchronized allowing the absence of a start
and a stop bit. In the second case these bits allow the communication among
devices without a common clock signal.
6.2 Standard serial protocols and buses
6.2.1 I C
The I C synchronous serial protocol allows the communication among devices
within a limited distance. It makes use of two lines: the devices are connected
to SDA (Serial Data Line) and SCL (Serial Clock Line). Moreover a device
can be master or slave, but only masters are able to start a transaction. The
communication is totally asynchronous and each packet is delimited by a
start condition and a stop condition and has the following property:
• the start condition occurs whenever the SCL is 1 and SDA changes
from 1 to 0 followed by a transition of SCL to 0.
• the stop condition occurs whenever the SCL is 1 and SDA changes from
0 to 1.
• the internal structure is made of the address of the slave device, the
type of the operation (R/W), an ACK signal for the correctness of the
address, the data byte and another ACK signal generated by the slave.
SDA and SCL are open-drain lines and require pull-up resistors.
The SPI protocol is synchronous and only one device can fulfill the role of the
master. The protocol makes use of four special lines: SCLK (Serial Clock),
MOSI (Master Output Slave Input), MISO (Master Input Slave Output)
and SS (Slave Select, one per each slave). All MISO signals are wired-or
connected on a single line therefore whenever a slave device is not active (SS
high), its MISO output must be in high impedance mode.
The communication starts when the master activates one slave changing
the status of the related SS line from high to low. Then the master generates a
clock signal on the SCLK line and begin a full-duplex communication thanks
to MISO and MOSI lines.
A daisy-chain connection is also allowed among master and slaves. In
this case there is a single SS line for all the slaves and they are connected in
sequence: the first slave reads data on MOSI and propagates them on MISO
connected to the MOSI port of the next slave device.
6.2.3 Protocols analysis 2
SPI has a better throughput compared to I C because of the full-duplex
communication and the lack of addressing. On the other hand, the SPI
protocol can support only one master and does not provide an acknowledge
mechanism unlike the I C protocol.
6.2.4 CAN bus
The CAN bus is based on a differential serial communication protocol that
allows the exchange of small data packets called frames. It works in asyn-
chronous mode and does not have an explicit arbitration mechanism thanks
to the implementation of a priority one: a high priority message wins the bus
and can use it while a lower priority message detects that the bus is busy
and waits. The structure of any bit is peculiar: first of all there is a syn-
chronization segment with a fixed duration, the propagation segment takes
care of delays introduced by physical bus lines and, at the end, two variable
length segments keep the clocks synchronized.
6.3 Parallel communication
The parallel communication uses several lines to carry data and it is more
expensive than the serial solution. The speed and the overall performance
is better than the serial communication, but electromagnetic interference
among the lines has to be taken into account. Moreover the design of parallel
interfaces is much more complex.
The group of lines used for the communication is often called "bus" and
the lines are divided into data bus, address bus and control bus.
6.3.1 Parallel protocols
The strobe protocol consists of a request control line. As soon as the master
asserts it, the slave has a predefined interval of time to put the data on the
bus, then the master read this data and deassert the request line ending the
communication. The assumption is that the slave is always able to provide
data in time.
The handshake protocol works in a similar way, but provides an extra
acknowledgment line to make the master aware of the data availability after
the assertion of the request line by the master itself. The disadvantage is
that the transaction takes more time.
A mixed protocol is also possible. The strobe/handshake uses a wait line
instead of an acknowledgement one, that can be asserted by the slave in case
of data unavailability.
6.4 Wireless communication
The wireless communication is also possible, but it is as not reliable as cable
communications. WiFi has a great speed, but the disadvantage is the power
consumption and the coverage. Bluetooth has a very good power consump-
tion if the low energy variant is used, but it is suitable only for short-range
communications. GSM provides another communication method that is good
as far as coverage is concerned, but costs and power consumption are very
7 [AOS] Interfacing
While the word communication is used when two or more systems exchange
information, the word interfacing considers the data transfer among compo-
nents of the same system. Usually the system is composed by a microproces-
sor, a memory and some peripherals and data exchange is performed thanks
to a few techniques such as polling, interrupts and DMA.
The microprocessor has to specify the device, memory or peripheral, to talk
to. In order to do that there are different solutions:
the address space is divided into pieces and
Memory mapped I/O:
each one is assigned to a specific device. In this case the system only
needs one shared bus among the memory and the peripherals. All the
different devices can be mapped on non-contiguous addresses.
this solution benefits from an additional line to the
control bus in order to specify if the address refers to a peripheral or
to a memory location. The advantage is that the I/O devices do not
reduce the address space available for the memory.
the bus is used only by the memory while the
Port mapped I/O:
other devices communicate directly with the microprocessor through
dedicated lines. The disadvantage is the poor scalability of this mech-
anism, on the other hand the address space reserved for the memory is
not reduced, exactly as in the standard I/O way.
The following ones are solutions used to figure out when to access a device
(for instance, a data is ready to be read).
First of all the status of the device must be continuously checked out by the
microprocessor by reading specific registers. This solution is characterized
by some disadvantages: the microprocessor has to read registers of the de-
vice even if data is not ready wasting its computing power by doing useless
operations. Moreover the ready data rate of the devices can be very different
and this can affect the complexity of the polling routines.
In order to avoid the waste of computing power, the interrupt mechanism
can be implemented. The interrupt is a signal sent by the peripherals to the
microprocessor pointing out that available data has to be read. The com-
munication is completely asynchronous. This requires at least one interrupt
request line often called INT. At the end of each assembly instruction, the
microprocessor checks the value of the interrupt line: if it is active, the mi-
croprocessor halts the current program execution and executes the interrupt
service routine to read the available data from the requesting peripheral.
After that, the execution of the program resumes.
The interrupt mechanism can be fulfilled in many ways: each peripheral
can have its own interrupt line or one single line can be shared among the
peripherals. In this case when a peripheral makes a request, the others are
inhibited while the microprocessor carries out the request itself. Moreover
the peripherals can be connected to each other and they are crossed by the
interrupt line. This has the advantage to establish a simple priority mecha-
One of the main issues is the association of the interrupt signal to the
correct peripheral such that the right routine is executed. The address of the
interrupt service routine can be fixed, therefore when the interrupts occur
the microprocessors accesses a predefined memory area. The limitations of
this solutions are worked out by the vectored interrupt mechanism: whenever
an interrupt occurs, the peripheral writes on the bus the address related to
its own service routine. The microprocessor now knows the location of the
ISR and jumps to that location to executes it.
Sometimes it may be needed to disable completely interrupt requests, for
example during the execution of some critical sections of code. In this case
interrupts can be masked: the easiest way is to use the assembly instruction
DI (disable interrupts). There are some interrupt requests that must be
always carried out: those requests are called NMI (non-maskable interrupt).
The DMA controller has the capability to exchange data between peripherals
and memory without affecting the computing power of the microprocessor.
The peripheral is directly connected to the DMA controller and not to the
microprocessor. When an interrupt is generated, the DMA controller re-
quests the bus to the processor, that replies with an ACK signal. The main
disadvantage is that for all the duration of the data transfer, the bus cannot
be exploited by the microprocessor. Moreover the DMA controller needs the
address of the peripheral, the address of memory where to write data and
the quantity of data that has to be transferred.
8 Hardware Technologies
The hardware technologies allow to realize analog and digital circuits at
the base of any embedded system. The designer can choose among several
-Commercial Off The Shelf - components produced in large volumes
that are available on the market and can be used in our system. They
are directly mounted on the board and assure a small time-to-market,
but they are not always optimized in terms of power consumption and
performance. if the designer needs to introduce complex functions that
are not offered by standard components, microprocessors are the right
choice. The software is in charge of providing the specific function.
this kind of solution offers a huge amount of logic,
storage and communication resources. The designer has to configure
the device that is very flexible and more efficient than the only software
they are integrated circuits developed for one specific purpose only.
Their performance and optimized power consumption are not com-
parable to the previous solution taken into account. In order to be
affordable, the production volume must be very high.
Large-scale projects might take the advantage of hardware blocks that have
been already developed. As far as Intellectual Property (IP) is concerned,
the reusability of the design can be classified in two ways. The Hard Macro
IP cannot be modified by the designer keeping the original specifications of
the block and avoiding reverse engineering. Only the external interfaces are
specified. The Soft Macro IP provides a thorough description (possibly a
VHDL or Verilog one) of the block that can be easily integrated in our de-
Hardware design flow
The hardware design flow has the purpose to convert a synthesizable speci-
fication into a digital circuit. The flow is divided in two parts: the front-end
and the back-end. The front-end starts with a HDL description of the hard-
ware blocks. Each block is verified using a specific test-bench, optimized and
technologically mapped. The back-end deals with the floorplanning, that is
the repartition of the silicon area associated to each functional block, then
the layout of the circuit is built by the placement and routing phases defining
the geometry to manufacture the required masks. As far as programmable
logic devices are concerned, the back-end is extremely simplified because cells
and interconnections are already present on the device, which only needs to
8.1 ASIC Prototyping Technologies
8.1.1 The planar process
The planar process is at the base of the ASIC technology. Silicon is used as
semiconductor while interconnections among components are made of copper.
The process allows to obtain the following elements on the starting pure
silicon: p-type Silicon, n-type Silicon, insulator, to isolate the active zones
of p and n Silicon (SiO is commonly used), conductor. The phases of the
planar process are the following:
1. In order to make the starting Silicon layer very pure, the impurities are
ruled out using a hot coil that has the purpose to collect the impurities
themselves in a specific area only. After that this area is removed.
2. Starting from the wafer, a circular Silicon layer, the deposition of the
photoresist takes place.
3. The wafer is exposed to UV light in order to make specific zones of
the photoresist strong enough to resist to chemical agents. This is the
4. In the washing phase, those parts of the photoresist that have not been
exposed to the UV light are removed.
5. p and n-type Silicon regions are created and a new process of photoresist
deposition, masking and washing is carried out.
6. As soon as all the active zones are ready, the wafer is covered by a layer
of Silicon dioxide, that is later masked and removed.
7. Lastly a thin layer of conductive material (Aluminum) is used to create
contacts on the active parts of our device.
The manufacturing of an ASIC using the planar process can require a number
of masks that is in the order of 25-30. This is why making ASICs is extremely
8.1.2 Standard cell and gate array technologies
The following technologies offer a simplification to the ASIC manufacturing
process. The standard cell technology limits the degrees of freedom offered
by the planar process in order to dominate the prototyping complexity of
digital circuits. The back-end phase of the project is simplified standardizing
some aspects. A cell is a small digital circuit that has been already designed
and optimized. It can fulfill different roles such as logic gate, sequential
component, memory... The structure of a standard cell chip is characterized
by a pad ring, rows that implement the functional logic and channels that
contain the interconnection lines.
The gate array technology simplifies the ASIC manufacturing process
starting from a partially available chip that provides groups of p and n tran-
sistors. Moreover at the top and at the bottom of the gate array a Vdd rail
and a Vss rail can be found. The peculiarity of this technology is that only
the interconnections among the transistors must be described to get to the
8.1.3 Full custom
The full custom technology offers the designer the maximum flexibility and
degree of freedom in prototyping and developing digital circuits. Moreover
the designer is not obliged to follow any geometrical constraint regarding the
8.2 Programmable Technologies
The prototyping activity of the designer can be further simplified by the use of
programmable logic devices that integrate logic resources and programmable
interconnections. Unlikely the ASIC, any PLD has only to be programmed
in order to provide the desired functionalities and does not require the chip
manufacturing. Any PLD has a specific programming mode:
the OTP devices can be programmed one time
only. They implement a fuse or antifuse technology based on the mod-
ification of the internal physical structure of the device.
the device can be configured multiple times. The cost
is higher than the one of an OTP device and the devices implement a
volatile or non-volatile memory (EEPROM).
the device can be dynamically and selectively programmed
even if it is working and is in use (SRAM, Flash memory).
Two remarkable factors that characterized any programmable logic device
are the organization of the cells and the structure of the interconnections.
The cells are often configured following a very regular matrix-like structure.
Moreover highly cell-populated zones can be alternated to interconnection
zones or the distribution of cells and interconnections can be uniform on all
the chip surface (FPGA). Furthermore the interconnections strongly deter-
mine the performance and the latency of a PLD. A global connection crosses
the whole chip while a local connection deals with a small number of cells.
The FPGA is the evolution of some old programmable devices such as PAL,
PLA and GAL. It is characterized by a distributed and matrix-like structure
of the logic resources. In the middle, block-RAM can be typically found
while the cells on the edge are in charge of I/O operations. In the corners
there can be usually found some blocks dedicated to specific functions such
as CLK generators, digital PLLs and modules devoted to the programming
of the device. The FPGA offers the highest flexible and powerful choice if
the production volume is not so huge, it makes the prototyping phase faster
and has a nice cost-performance relationship. The product design using a
FPGA is almost uniquely front-end related.
8.3 Feasibility Study
The main purpose of the feasibility study is to analyze and evaluate the ad-
vantage of a proposed solution. It can deal with an existing product that
is experiencing a technological evolution or a new one that has to be de-
signed from scratch. The main aspects of the study are the elicitation of the
characteristics of the system-to-be, the selection of the suppliers and a deep
analysis of the costs related to the proposed solution, also with respect to
the possibly existing one.
The functional requirements and constraints of the final product may de-
termine the architecture that will be used, for example the choice of ASIC
instead of COTS and programmable logic such as FPGA. In this case the
possible benefits will be an easier PCB routing and the reduction of elec-
tromagnetic interference working out pin-out problems. The front-end and
the back-end of the design flow can be carried out by different entities: a
design center and a silicon foundry. They both determine significant costs.
The front-end related costs mainly deal with HDL code development and
analysis, testing and simulation while the back-end costs are mostly related
to IP blocks, the production of the masks the manufacturing process.
+1 anno fa
I contenuti di questa pagina costituiscono rielaborazioni personali del Publisher hardware994 di informazioni apprese con la frequenza delle lezioni di Embedded Systems e studio autonomo di eventuali libri di riferimento in preparazione dell'esame finale o della tesi. Non devono intendersi come materiale ufficiale dell'università Politecnico di Milano - Polimi o del prof Fornaciari William.
Acquista con carta o conto PayPal
Scarica il file tutte le volte che vuoi
Paga con un conto PayPal per usufruire della garanzia Soddisfatto o rimborsato