Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
vuoi
o PayPal
tutte le volte che vuoi
XPERIMENTS AND IMPLEMENTATION
Abstract - The main purpose of this chapter is to present experiment results of vehicle detection on
Raspberry Pi 3 using a boosted Cascade of simple Haar features applied on images captured by a
Syma H8.0 Mini Ultradrone. 46
4.1. Haar-like features
In order to develop a real-time monitoring system of urban traffic, the main starting point is the
detection of cars in real world environment. There are several object recognition algorithms in the
literature. However, as our aim is to develop them for portable Raspberry on-board computers for
real-time analyses, we need an approach for object detection which minimizes computation time
and resources while achieving medium detection accuracy. are “trained” to match
Our chosen technique uses Haar-like features, and cascade classifiers which
cars in real roads. Haar-like features are digital image features used in object recognition. They owe
their name to their intuitive similarity with Haar wavelets. Haar-like features were proposed by
Viola and Jones as an alternative method for object detection. The general idea was to describe an
object as a cascade of simple feature classifiers organized into several stages.
In the detection phase of the Viola–Jones object detection framework, a window of the target size is
moved over the input image, and for each subsection of the image the Haar-like feature is
calculated.
A simple rectangular Haar-like feature can be defined as the difference of the sum of pixels of areas
inside the rectangle, which can be at any position and scale within the original image. This modified
feature set is called 2-rectangle feature. Viola and Jones also defined 3-rectangle features and 4-
rectangle features. The values indicate certain characteristics of a particular area of the image. Each
feature type can indicate the existence (or absence) of certain characteristics in the image, such as
edges or changes in texture. For example, a 2-rectangle feature can indicate where the border lies
between a dark region and a light region.
Figure 39: Viola and Jones features 47
Lienhart and Maydt introduced the concept of a tilted (45°) Haar-like feature. This was used to
increase the dimensionality of the set of features in an attempt to improve the detection of objects in
images. This was successful, as some of these features are able to describe the object in a better
way. For example, a 2-rectangle tilted Haar-like feature can indicate the existence of an edge at 45°.
Figure 40: Lienhart and Maydt features
Messom and Barczak extended the idea to a generic rotated Haar-like feature. Although the idea is
sound mathematically, practical problems prevent the use of Haar-like features at any angle. In
order to be fast, detection algorithms use low resolution images introducing rounding errors. For
this reason rotated Haar-like features are not commonly used.
This difference of the sum of pixels of areas inside the rectangle is then compared to a learned
threshold that separates non-objects from objects. Because such a Haar-like feature is only a weak
learner or classifier, a large number of Haar-like features are necessary to describe an object with
sufficient accuracy. For this reason Haar-like features are therefore organized in something called
a classifier cascade to form a strong learner or classifier.
The idea consists of using the first stages of the cascade to effectively discard most of the regions of
classifier’s threshold so that the false
the image that have no objects. This is done by adjusting the
negative is close to zero. By discarding many candidate regions early in the cascade, Viola and
method’s
Jones significantly improve the performance.
Viola and Jones set up a framework to combine several features into a cascade, i.e. a sequence of
tests on the image or on particular regions of interest, organized into several stages, each based on
the results of one or more different Haar features. For an object to be recognized, it must pass
through all of the stages of the cascade.
Figure 41: schema of cascade of boosted Haar-features 48
The key advantage of a Haar-like feature over most other features is its calculation speed. Due to
the use of integral images, a Haar-like feature of any size can be calculated in constant time.
The integral images representation for images are introduced in order to compute these features
very rapidly at many scales. The integral image can be computed from an image using a few
operations per pixel. Once computed, any one of these Harr-like features can be computed at any
scale or location in constant time.
When creating an Integral Image, we need to create a Summed Area Table. In this table, if we go to
any point (x,y) then at this table entry we will come across a value. This value itself is quite
interesting, as it is the sum of all the pixel values above, to the left and of course including the
original pixel value of (x,y) itself. What is really good about the Summed Area Table, is that we are
actually able to construct it with only one pass over of the given image:
We get the original pixel value i(x,y) form the image, and then we add the values directly above this
pixel, and directly left to this pixel from the Summed Area Table at s(x-1, y) and s(x, y-1). Finally,
–
we subtract the value directly top-left of i(x,y) from the Summed Area Table that is, s(x-1, y-1).
Example:
Once we have used the equation to calculate and fill up your Summed Area Table, the the task of
calculating the sum of pixels in some rectangle which is a subset of the original image can be done
in constant time, that is O(1) complexity.
In order to do this we only need to use 4 values from the Summed Area Table, that is, 4 array
references into the Summed Area Table. With these 4 values, we then add or subtract them for the
correct value of the sum of the pixels within that region. To do this, we use this equation: 49
Each Haar-like feature may need more than four lookups, depending on how it was defined. Viola
and Jones's 2-rectangle features need six lookups, 3-rectangle features need eight lookups, and 4-
rectangle features need nine lookups.
4.2. Cascade classifier training
The work with a cascade classifier includes two major stages: training and detection. In this
subchapter we describe how to train a cascade classifier: preparation of a training data and running
the training application. for the detection of car’s rears, i.e. to train a Haar
The attempt is to train a generalized cascade but an indiscriminate car’s rear detector. Haar
cascade that would detect not a particular car model,
features are first trained to obtain a representation to be used latter for real time object detection.
The steps for training a haar classifier can be divided into:
Creating the description file of positive and negative samples
Packing the positive samples into a vec file
Training the classifier
Converting the trained cascade into a xml file
4.2.1. Training Datasets description
The first step of a haar classifier training process regards the determination of positive and negative
image sets. Positive image contains the target object which we want our machine to detect. Unlike
them, a negative image doesn’t contain such target objects.
We need a large number of images of our object. In would be really a long operation to shooting all
the positive and negative images, which can run into thousands. For this reason we came up with
some strategies.
One common approach is to shoot a video of our object and then write a simple programme to
extract all the frames. It could be really helpful because, for example, a 25 fps video of 1 minute
will yield 1500 static pictures. This is the strategy adopted in order to create the first dataset of both
positive and negative samples (for the set of negative examples it is enough to use the video frames
in which no cars are present and the road is clear). This dataset contains 526 images of cars (360 x
240 pixels, no scale, jpeg format) and 1370 background images (279 x 186 pixels, jpeg format).
These images are extracted from the Car dataset proposed by Brad Philip and Paul Updike taken of
the freeways of southern California.
Figure 42: few samples from dataset 1 of positive images 50
On second tactic is to download a generic catalogue of cars from image source websites, like
ImageNet. This is the basic idea exploited in order to gather two other positive datasets. One formed
by 196 jpg files of variable dimension (less than 100 x 100 pixels) and another formed by 2611 bmp
files of fixed dimension (320 x 238 pixels).
One second negative dataset is created using 1964 images (100 x 100 pixels) resulted at the search
of keywords: "athletics" and "sports" on ImageNet. And the last negative set is actually the Brad
and Paul datasets with just a format conversion (jpeg to bmp) and a dimensional stretching (320 x
238 pixels) in order to have the same height and width of the relative positive dataset.
Table 1: characteristics of training datasets
Dataset 1 Dataset 2 Dataset 3
Positive Negative Positive Negative Positive Negative
type 526 1370 196 1964 2611 1370
images 360x240 279x186 Variable 100x100 320x238 320x238
pixels jpg jpg jpg jpg bmp bmp
format
4.2.2. Training setup
Once the positive and negative images are prepared, we put them in two different folders called
positive and negative. The next step is the creation of description files for both positive and
negative images. The description file is just a text file, with each line corresponding to each image.
The fields in a line of the positive description file are: the image name, followed by the number of
objects to be detected in the image, which is followed by the x,y coordinates of the location of the
object in the image. Some images may contain more than one objects.
For example:
C:\Users\MehediAnam\Documents\UNI\TESI\TRAINING\dataset 2\positive 2\car1.jpg 2 80 51 114
100 79 49 116 99
The description file of positive images can be created using the object marker program or just using
the command line dir /b >positive.txt once we are in the directory of the image files.
In order to go on with the following steps, we first ensure to have the full path of our datasets'
directory.
Now we can move on to creating the samples for training. All the positive images in the description
file are packed into a .vec file. It is created using the createsamples utility provided with any
version of OpenCV package. Its synopsis is:
%OPENCV_FULL_DIRECTORY/opencv_createsample.exe -info positive.txt -vec samples.vec -w
30 -h 20
Our positive image descriptor file is named positive.txt and the name chosen for the vec file is
samples.vec. Then we choose the minimum width of the object to be detected as 30 pixel and the 51
height as 20 pixels. These two parameters are really important because they determine the minimum
dimension of detecting window.
Although real world images are more performing, createsamples is a very useful utility to create
positive traini