C Introduction, Images
venerdì 6 marzo 2020 19:52
1983 def: AI is the study of how to make computers do things at which, at the moment, people are
better.
2018 def: AI is the theory and development of computer systems able to perform tasks normally
requiring human intelligence, such as visual perception, speech recognition, decision making, and
translation between languages.
Some definitions
Artificial Intelligence is:
Analyze the environment: perception.
- ○ Perception, sensing, vision, NLP, speech analysis, IoT processing…
Display intelligent behavior: reasoning.
- ○ Learning, understanding, reasoning, planning, having emotions…
Take actions: action.
- ○ Moving and stopping, grasping, suggesting and recommending…
With a certain degree of autonomy, human can take control of the system.
- For a specific goal.
-
AI is an intelligence different from human one, but controlled by humans.
Vision: use intelligence to understand the world from visual stimuli.
- Visual intelligence: a cogninive capability to think and reason through mental images.
- Computer vision is the science that tries to reconstruct the 3D world and understand the world
- from images and videos.
From AI to Computer Vision:
Artificial Intelligence: the scientific field that studies how to create computers and computer
- software that are capable of intelligent behaviors, using sensing, perception, knowledge,
reasoning and learning.
Machine Learning: the scientific discipline that studies how to construct algorithms that can
- learn from and make predictions on data, for getting computers to act without being explicitly
programmed.
Deep Learning: a branch of Machine Learning for modeling and implementing deep neural
- network architectures and algorithms.
Pattern Recognition: the scientific discipline that studies how to classify or recognize patterns
- and observed data using a-priori knowledge, statistical information and learning.
Computer vision: the scientific discipline that studies how computers can perceive and
- understand the world through visual data (and provide a visual intelligence).
Cognitive Computing: describes technology platforms that are based on scientific disciplines of
- Artificial Intelligence and Signal Processing. These platforms include machine learning, reasoning,
natural language processing, speech and vision, human-computer interaction, dialog and
narrative generation and more.
From Computer Vision to Machine Vision:
Image Processing: the scientific discipline that studies how to automatically modify images to
- create other images with changed or enhanced properties (ex. filtering, denoising, compressing).
Image (Video) Analysis: the scientific discipline that studies how to automatically process images
- and videso to extract visual information (ex. measuring object shapes, counting people in a mall).
Imaging: the science that studies how to modify, manipulate, process images manually or
- automatically, for further tasks (ex. photo correction, medical imaging).
Vision and Cognitive Systems Pagina 1
automatically, for further tasks (ex. photo correction, medical imaging).
Machine Vision: the engineering field that studies how to build computer vision-based systems,
- services and solutions, typicaly for industrial environments.
ABC of Images and Processing Operators
Different image definitions:
The image is a discrete representation of a 2D continue function .
- The image is a pixel matrix.
- The image is a 2D tensor used as input of many algorithms.
- For image processing, image is a bidimensional signal, sampled and quantized.
-
Photometry: studies the measures of the brightness.
Colorimetry: studies the wavelength and the color emission.
Histograms are good to represent images, but we lose the spatial information of the images.
It is a plot of the number of pixels for each tonal value. The left side of the horizontal axis represents
the dark areas, the middle represents mid-tone values and the right hand side represents light areas.
The vertical axis represents the size of the area (total number of pixels) that is captured in each one of
these zones. ( operator counts the number of elements)
Normalized Histogram is the histogram divided by pixels and represents the probability distribution.
Cumulative Histograms let us also define the cumulative distribution function corresponding to
(prob dist or normalized hist). The cumulative hist is the probability of having a pixel value less than :
( is the total number of grey levels in the image)
Vision and Cognitive Systems Pagina 2
Image Entropy specifies the uncertainty in the image values. It measures the averaged amount of
information required to encode the image values. A high value indicates a non-equal distribution of the
color.
Saturated arithmetic is used in imaging when the result of the image processing should be another
image, it is not correct from a mathematical point of view, but is still visible to humans.
If we do a transformation that gives me less than , my value becomes ; if it gives me more than
, my value becomes ( ).
Image Processing Operators
Point operators: the value of each pixel of the resulting image depends only on the original pixel
- in the same image spatial position:
Local (Neighborhood) operators: the value of each pixel depends only on the original pixel in the
- same image spatial position and in a local Neighborhood:
Global operators: the value of each pixel depends on all the pixels of the original image:
-
Point operators offer the possibility to parallelize the operations.
Linear point operator:
If is a linear transformation, that transforms an image into another image through a
processing operation, we can write:
Where is the scale factor called gain or contrast, is the offset costant called bias or brightness.
Contrast stretching: is the expansion of the gray (or color) level of pixels in a dynamic range, given the
histogram. Vision and Cognitive Systems Pagina 3
C Filtering
sabato 14 marzo 2020 20:15
Point processing gets intensity of a single pixel as input and it is unaware of spatial information.
Neighborhood processing instead, takes spatial information into account.
Noise
Noise is everything that is not a signal or everything that is not a useful information. Noise is also
something relevant we cannot see in the image.
Images are normally affected by noise. Image noise is a random (not present in the object image)
variation of brightness or color information. It can be produced by sensors, cameras, transmissions,
artifacts, etc… We can define 3 types of noise:
Signal noise: the most important, it is an additive noise (something added to the image).
- Computational noise: it is an error, acting as a noise, produced by computational tasks with
- approximations or computational limitations (ex. saturated arithmetics).
Perceptual noise: called distractors, it is everything in the image which is not the target of the
- image itself. It is something that distracts us from the image and decreased the perception and
understanding of the rest.
In order to provide compute vision tasks, noise signals must be removed. We would like to eliminate all
types of noise. The noise can be generally:
Salt and pepper noise: random occurencies of black and white pixels.
- Impulse noise: random occurrencies of white pixels.
- Gaussian noise: variations in intensity drawn from a Gaussian normal distribution.
-
The signal noise in an image is supposed to be white.
White noise is a random signal (or process) with a flat power spectral density. If a time series is
normally distributed with zero mean and std deviation , the series is called Gaussian white noise.
Gaussian i.i.d white noise
Image restoration: the image processing subfield which aims at improving the image quality by
eliminating noise and artefacts.
Types of noise for image restoration:
Without other information, noise is supposed to be additive and Gaussian, that is a stochastic
- Vision and Cognitive Systems Pagina 4
Without other information, noise is supposed to be additive and Gaussian, that is a stochastic
- value added to the pixel value, completely uncorrelated with the signal and the noise added to
other pixels. It is always present, especially from low cost cameras.
Impulsive noise (salt and pepper) is a random modification of pixel values.
- Multiplicative noise, in which we have pixel values multiplied according to the type of sensor.
- Speckle noise, in which we have both additive and multiplicative noise ( ).
-
Noise Reduction and Filtering
In surveillance, images are usually affected by noise because we use low cost cameras with low quality.
To reduce the noise, we can do a smoothing process. The simplest is the average of the values, we
replace each pixel with an average of all values in its neigborhood, in order to smooth the high
frequencies. We can find a tranformation that starts with a signal and goes to the signal :
is the average around
Doing the average (mean) filter, that is a linear filter, it makes an image blurred because our brain also
does a Gaussian filtering, so we don't see the edges.
Linear filtering
Given an image , linear filtering consists in a process which gives in output a new image , where
each location is a weighted sum of the original pixel values neighborhood, using the same set of
weights each time.
The result is:
Shif-invariant: the output value depends only on the pattern in the image neighborhood and not
- on the position in the image.
Linear: the output obtained by the sum of two images is the same obtained by summing the two
- outputs separately (superposition of effects).
The pattern of weights used for a linear filter is usually called kernel of the filter. The process that
applies the kernel or filter is usually called correlation or convolution.
Filtering:
Given a signal , a filter has a variability between and and it is applied as:
Dot product or (cross-)correlation
Correlation and convolution are identical in computer vision, since kernel coefficients are symmetric.
Difference between continuous and discrete signals:
Vision and Cognitive Systems Pagina 5
As we said before, linear filers have some properties:
Linearity:
- Scalarity:
- Shift-invariance:
-
Padding:
If we apply a filter in the border, we obtain wrong results because we don't have information in the
border. To resolve this border problem, we use padding:
Zero padding: insert 0 pixels.
- Constant padding: insert a specific color in the border.
- Clamp to edges: repeat the edge value.
- Wrap: loop around in a toroidal configuration.
- Mirror: reflect the edges.
-
In CV, the best is zero padding because we only need information in the central part and we lose
information at each processing step. When we do convolutions, the information that has a dimension
of half of the kernel is lost forever, because it is wrong.
Smoothing:
Mean filter: is obtained moving an average filter (smoothing or blurring).
A low pass blurring is given by averaging the pixel with the neighbor ones, it corresponds in convolving
the image with a kernel of 1 values and then scaling (ex. kernel filled by ).
We can apply as many filters we want.
Cons: it limits also the information with the same spatial frequency (blurring) and does not work on
salt-pepper noise. In this image, we want to
mantain the indicator
Gaussian filter: is the best filter to smooth Gaussian noise. It is an isotropic mask given by a Gaussian
function with zero average and a given std deviation, convolved with the image.
Vision and Cognitive Systems Pagina 6
The filter must be discretized choosing , that is the filter size, and the std deviation.
.
In a filter, we choose a value of about
For a barcode, we need to apply a Gaussian with a kernel that is smaller w.r.t. the information we have.
Linear filters can be combined and applied in cascade. Ex. sharpening filter:
Average filter
Great ideas of CNNs:
Use only the receptive field and not all the inputs.
- Use the same set of convolutive kernel weights.
- Use many kernels in parallel to create a feature map or feature vector.
- Feature maps are processed through non-linear activation functions and compressed by pooling
- to work in multiresolution.
Feature vectors are used for final inference, using fully connected layers to take decisions.
- Vision and Cognitive Systems Pagina 7
Feature vectors are used for final inference, using fully connected layers to take decisions.
-
Non Linear Filters
Min Max filter: the minumum and maximum value in the moving region of the original image is the
result of the min and max filter respectively (ex. max pooling).
Median filter: non linear filter, useful to clean and mitigate the effect of salt-pepper or impulsive noise.
The output pixel is the median value of the neighborhood. It is also possible to compute a weighted
median filter. Application of median filter.
If we apply a gaussian filter to
the first image, we spread the
noise.
Variable-valued Filters
Bilateral filter: combine a weighted filter kernel with outlier rejection. Every sample is replaced by a
weighted average of its neighbors. These weights reflect how close and how similar are the neighbor
and the center sample (larger weight to close and similar samples).
Bilateral weighted filter: in a neighbothood of the image , the result is a normalized
weighted sum: Range kernel
Domain kernel
Bilateral filter is not shift-invariant.
Vision and Cognitive Systems Pagina 8
Denoising
Block-matching and 3D filtering: exploits non-local statistics of the image (different patches in the
same image are often similar in appearance).
Deblurring
Blur: degradation of sharpness and contrast of the image, causing loss of high frequencies.
It is due to:
Camera motion.
- Camera defocus.
-
Blur is a generative process:
This function is only invertible if the noise is zero. If there is noise, the recovered image has bad quality.
Vision and Cognitive Systems Pagina 9
C Edges
lunedì 16 marzo 2020 17:47
Typical process pipeline:
1) Low level vision: filtering edge detection & selection segmentation.
2) Image analysis: labeling, visual features extraction.
3) Camera calibration.
4) High level vision: clustering, unsupervised classification.
Features extraction: is the task that performs the extraction of a -dimensional vector representing
some visual properties. The features have to summarize the image content, it is a quantization or a
compression problem. It is necessary to obtain a more compact representation and also for data
generalization to avoid overfitting.
Visual features criteria to take into account in order to select the features:
1) Discriminant property: features must assume values tha are significantly different for objects
belonging to different classes.
2) Reliability property: features must assume values that are similar for objects belonging to the
same class.
3) Independent property: features must be independent to each other (select the best features to
avoid overfitting, not linear combinations).
4) Minimum cardinality property: features must be as few as possible.
Gestalt also defines other properties for computer science:
Invariant properties: scaling, translation, geometric contraction, rotation…
- Subjective relevance property: independence from luminance changes.
-
To find contours in an image we can use local variations, for example we can look at the areas where
luminance changes.
Edges
Contours and borders are global properties of a region, instead edges are local properties.
Edge is a local property of a pixel and its neighborhood to have rapid intensity variation. Edge is a
vector with magnitude and direction. It depends on the luminance variation and we can compute it as
a gradient. The edge has the direction perpendicular to the gradient direction:
The edges can be created by different sources of luminance variation, mainly 4:
Vision and Cognitive Systems Pagina 10
Edges can be defined also as points or set of points where there is a high gradient.
To detect borders we need to:
1) Use an edge detection operator (edge detector).
2) Select strong edges with some given criteria.
3) Link the edges (labeling).
The problem is that the noise creates many false edge points.
Edge Detection Algorithms
Methods based on first derivative computation (Sobel).
- First derivative and regularization techniques using filtering and optimal masks (Canny).
- Border following local techniques based on neighborhood operation of labeled edges and then
- segmentation (Border tracing).
Classification (Deep learning).
-
Discrete detectors:
Images have discrete values, so we use discrete derivatives. We have 3 types:
Forward difference:
- Backward difference:
- Central difference:
-
Sobel Edge Detector Operator
Sobel operator uses a first central derivative, smoothed in the opposite direction.
Ex:
We use the sobel filter and compute the magnitude of the gradient. If the gradient is higher than the
threshold, it is an edge.
Canny Edge Detector
A good operator should have 3 criteria:
Good detection: we need to maximize the signal-to-noise ratio (SNR).
- Good localization: true edge detection (precise and position of the edge).
- One response to single edge: one answer for one edge, low false positive edges (false edges due
- to noise or artifacts).
Since finite difference filters respond too strongly to noise, we have to apply Gaussian filter and the
derivative of the Gaussian (DOG). The DOG is equal to the Gaussian of the derivative:
Vision and Cognitive Systems Pagina 11
Canny proposed the best continuous filter in 1D and then discretized and extended it to 2D:
1) Add a smoothing filter to keep high the signal-to-noise ratio.
2) Find the true directions of the gradients in order to extract only one edge.
3) Suppress the false edges.
Canny algorithm:
1) Smooth image with a 2D Gaussian filter (convolution of the Gaussian with the image):
2) Find the local edge normal direction for each pixel:
3) Compute the edge magnitude of Grad , to understand the strenght of the gradient for each
position:
4) Non-maximum suppression, that is to take just only the maximum of
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
-
Appunti completi Artificial Vision
-
14. Analyse - Vision, J. Du Bellay
-
Image processing - computer vision - Fisica sperimentale
-
Appunti completi del corso di Computer Vision