Anteprima
Vedrai una selezione di 20 pagine su 97
Esame Neural and Machine Learning Pag. 1 Esame Neural and Machine Learning Pag. 2
Anteprima di 20 pagg. su 97.
Scarica il documento per vederlo tutto.
Esame Neural and Machine Learning Pag. 6
Anteprima di 20 pagg. su 97.
Scarica il documento per vederlo tutto.
Esame Neural and Machine Learning Pag. 11
Anteprima di 20 pagg. su 97.
Scarica il documento per vederlo tutto.
Esame Neural and Machine Learning Pag. 16
Anteprima di 20 pagg. su 97.
Scarica il documento per vederlo tutto.
Esame Neural and Machine Learning Pag. 21
Anteprima di 20 pagg. su 97.
Scarica il documento per vederlo tutto.
Esame Neural and Machine Learning Pag. 26
Anteprima di 20 pagg. su 97.
Scarica il documento per vederlo tutto.
Esame Neural and Machine Learning Pag. 31
Anteprima di 20 pagg. su 97.
Scarica il documento per vederlo tutto.
Esame Neural and Machine Learning Pag. 36
Anteprima di 20 pagg. su 97.
Scarica il documento per vederlo tutto.
Esame Neural and Machine Learning Pag. 41
Anteprima di 20 pagg. su 97.
Scarica il documento per vederlo tutto.
Esame Neural and Machine Learning Pag. 46
Anteprima di 20 pagg. su 97.
Scarica il documento per vederlo tutto.
Esame Neural and Machine Learning Pag. 51
Anteprima di 20 pagg. su 97.
Scarica il documento per vederlo tutto.
Esame Neural and Machine Learning Pag. 56
Anteprima di 20 pagg. su 97.
Scarica il documento per vederlo tutto.
Esame Neural and Machine Learning Pag. 61
Anteprima di 20 pagg. su 97.
Scarica il documento per vederlo tutto.
Esame Neural and Machine Learning Pag. 66
Anteprima di 20 pagg. su 97.
Scarica il documento per vederlo tutto.
Esame Neural and Machine Learning Pag. 71
Anteprima di 20 pagg. su 97.
Scarica il documento per vederlo tutto.
Esame Neural and Machine Learning Pag. 76
Anteprima di 20 pagg. su 97.
Scarica il documento per vederlo tutto.
Esame Neural and Machine Learning Pag. 81
Anteprima di 20 pagg. su 97.
Scarica il documento per vederlo tutto.
Esame Neural and Machine Learning Pag. 86
Anteprima di 20 pagg. su 97.
Scarica il documento per vederlo tutto.
Esame Neural and Machine Learning Pag. 91
1 su 97
D/illustrazione/soddisfatti o rimborsati
Disdici quando
vuoi
Acquista con carta
o PayPal
Scarica i documenti
tutte le volte che vuoi
Estratto del documento

"MPG",

"Cylinders",

"Displacement",

"Horsepower",

"Weight",

"Acceleration",

"Model Year",

"Origin",

]

data = pd.read_csv(

url, names=column_names, na_values="?", comment="\t", sep=" ",

skipinitialspace=True

)

data

Check is there are missing entries in the dataset.

print(data.isna().sum())

Remove records with missing entries.

data = data.dropna()

print(data.isna().sum())

## Data inspection

Display some basic information.

data.head()

data.info()

data.describe()

We are interested in predicting the eld `MPG`, measuring [fuel ef ciency](https://

en.wikipedia.org/wiki/

Fuel_ef ciency#:~:text=Fuel%20economy%20is%20the%20distance,a%20certain%

20volume%20of%20fuel)), expressed in miles per gallon (MPG), where 1 MPG =

0.354006 km/L. Plot its distribution.

sns.displot(data["MPG"], kde=True)

Look for linear correlations among data.

data.corr()

sns.heatmap(data.corr(), annot=True, cmap="vlag_r", vmin=-1, vmax=1)

sns.pairplot(data, diag_kind="kde")

## Data normalization

Apply an af ne transformation to the data, so that each feature has zero mean and

unitary standard deviation.

data_mean = data.mean()

data_std = data.std()

data_normalized = (data - data_mean) / data_std

_, ax = plt.subplots(figsize=(16, 6))

sns.violinplot(data=data_normalized, ax=ax)

fi fi fi fi

## Train-validation split

Shuf e the data using the [np.random.shuf e](https://numpy.org/doc/stable/

reference/random/generated/numpy.random.shuf e.html) function and split the data

as follows:

- put 80% in the train dataset

- put 20% in the validation dataset

data_normalized_np = data_normalized.to_numpy()

np.random.seed(0)

np.random.shuffle(data_normalized_np)

fraction_validation = 0.2

num_train = int(data_normalized_np.shape[0] * (1 - fraction_validation))

x_train = data_normalized_np[:num_train, 1:]

y_train = data_normalized_np[:num_train, :1]

x_valid = data_normalized_np[num_train:, 1:]

y_valid = data_normalized_np[num_train:, :1]

print("train set size : %d" % x_train.shape[0])

print("validation set size: %d" % x_valid.shape[0])

## ANN setup

Write a function `params = initialize_params(layers_size)` that initializes the

parameters, given the ANN architecture.

Initialize biases with zero values, and weights with a [Glorot Normal](http://

proceedings.mlr.press/v9/glorot10a/glorot10a.pdf) initialization, i.e. sampling from a

Gaussian distribution with zero mean and with standard deviation

$$

\sqrt{\frac{2}{n + m}},

$$

where $n$ and $m$ are the number of input and output neurons of the

corresponding weights matrix.

def initialize_params(layers_size):

np.random.seed(0) # for reproducibility

params = list()

for i in range(len(layers_size) - 1):

W = np.random.randn(layers_size[i + 1], layers_size[i]) * np.sqrt(

2 / (layers_size[i + 1] + layers_size[i])

)

b = np.zeros((layers_size[i + 1], 1))

params.append(W)

params.append(b)

return params

Implement a generic feedforward ANN with a function `y = ANN(x, params)`, using

$ReLU$ as activation function.

fl fl fl

activation = jax.nn.relu

def ANN(x, params):

layer = x.T

num_layers = int(len(params) / 2 + 1)

weights = params[0::2]

biases = params[1::2]

for i in range(num_layers - 1):

layer = weights[i] @ layer - biases[i]

if i < num_layers - 2:

layer = activation(layer)

return layer.T

params = initialize_params([7, 10, 1])

ANN(x_train[:10, :], params)

Implement the quadratic loss (MSE) function `L = MSE(x, y, params)`.

def MSE(x, y, params):

error = ANN(x, params) - y

return jnp.mean(error * error)

params = initialize_params([7, 10, 1])

print(MSE(x_train, y_train, params))

Implement an $l^2$ regularization term for the ANN weights:

$$

\mathrm{MSW} = \frac{1}{n_{weights}} \sum_{i=1}^{n_{weights}} w_i^2

$$

and define the loss function as

$$

\mathcal{L} = \mathrm{MSE} + \beta \, \mathrm{MSW}

$$

where $\beta$ is a suitable penalization parameter.

def MSW(params):

weights = params[::2]

partial_sum = 0.0

n_weights = 0

for W in weights:

partial_sum += jnp.sum(W * W)

n_weights += W.shape[0] * W.shape[1]

return partial_sum / n_weights

def loss(x, y, params, penalization):

return MSE(x, y, params) + penalization * MSW(params)

print(MSW(params))

print(loss(x_train, y_train, params, 1.0))

Run this cell: we will this callback to monitor training.

from IPython import display

class Callback:

def __init__(self, refresh_rate=250):

self.refresh_rate = refresh_rate

self.fig, self.axs = plt.subplots(1, figsize=(16, 8))

self.epoch = 0

self.__call__(-1)

def __call__(self, epoch):

self.epoch = epoch

if (epoch + 1) % self.refresh_rate == 0:

self.draw()

display.clear_output(wait=True)

display.display(plt.gcf())

time.sleep(1e-16)

def draw(self):

if self.epoch > 0:

self.axs.clear()

self.axs.loglog(history_loss_train, "b-", label="loss train")

self.axs.loglog(history_loss_valid, "r-", label="loss validation")

self.axs.loglog(history_MSE_train, "b--", label="RMSE train")

self.axs.loglog(history_MSE_valid, "r--", label="RMSE validation")

self.axs.legend()

self.axs.set_title("epoch %d" % (self.epoch + 1))

## Training

Train an ANN with two hidden layers with 20 neurons each, using 5000 epochs of

the SGD method (with minibatch size 100) with momentum ($\alpha = 0.9$).

Employ a linear decay of the learning rate:

$$

\lambda_k = \max\left(\lambda_{\textnormal{min}}, \lambda_{\textnormal{max}} \left(1

- \frac{k}{K}\right)\right)

$$

with $\lambda_{\textnormal{min}} = 5e-3$, $\lambda_{\textnormal{max}} = 1e-1$ and

decay length $K= 1000$.

During training, store both the MSE error and the loss function obtained on the train

and validation sets in 4 lists, respectively called:

- `history_loss_train`

- `history_loss_valid`

- `history_MSE_train`

- `history_MSE_valid`

Test different choices of the penalization parameter $\beta$.

# Hyperparameters

layers_size = [7, 20, 20, 1]

penalization = 2.0

# Training options

num_epochs = 5000

learning_rate_max = 1e-1

learning_rate_min = 5e-3

learning_rate_decay = 1000

batch_size = 100

alpha = 0.9

########################################

params = initialize_params(layers_size)

grad = jax.grad(loss, argnums=2)

MSE_jit = jax.jit(MSE)

loss_jit = jax.jit(loss)

grad_jit = jax.jit(grad)

n_samples = x_train.shape[0]

history_loss_train = list()

history_loss_valid = list()

history_MSE_train = list()

history_MSE_valid = list()

def dump():

history_loss_train.append(loss_jit(x_train, y_train, params, penalization))

history_loss_valid.append(loss_jit(x_valid, y_valid, params, penalization))

history_MSE_train.append(MSE_jit(x_train, y_train, params))

history_MSE_valid.append(MSE_jit(x_valid, y_valid, params))

dump()

cb = Callback(refresh_rate=500)

velocity = [0.0 for i in range(len(params))]

for epoch in range(num_epochs):

learning_rate = max(

learning_rate_min, learning_rate_max * (1 - epoch / learning_rate_decay)

)

idxs = np.random.choice(n_samples, batch_size)

grads = grad_jit(x_train[idxs, :], y_train[idxs, :], params, penalization)

for i in range(len(params)):

velocity[i] = alpha * velocity[i] - learning_rate * grads[i]

params[i] += velocity[i]

dump()

cb(epoch)

cb.draw()

print("loss (train ): %1.3e" % history_loss_train[-1])

print("loss (validation): %1.3e" % history_loss_valid[-1])

print("MSE (train ): %1.3e" % history_MSE_train[-1])

print("MSE (validation): %1.3e" % history_MSE_valid[-1])

We now want to to investigate more in depth the effect of the penalization parameter

$\beta$.

Write a function that, given the penalization parameter, trains the ANN (with the

same setting used above) and returns a dictionary containing the nal values of:

- train MSE

- validation MSE

- MSW

# Hyperparameters

layers_size = [7, 20, 20, 1]

# Training options

num_epochs = 5000

learning_rate_max = 1e-1

learning_rate_min = 5e-3

learning_rate_decay = 1000

batch_size = 100

alpha = 0.9

def train(penalization):

params = initialize_params(layers_size)

n_samples = x_train.shape[0]

velocity = [0.0 for i in range(len(params))]

for epoch in range(num_epochs):

learning_rate = max(

learning_rate_min, learning_rate_max * (1 - epoch / learning_rate_decay)

)

idxs = np.random.choice(n_samples, batch_size)

grads = grad_jit(x_train[idxs, :], y_train[idxs, :], params, penalization)

for i in range(len(params)): fi

velocity[i] = alpha * velocity[i] - learning_rate * grads[i]

params[i] += velocity[i]

return {

"MSE_train": MSE(x_train, y_train, params),

"MSE_valid": MSE(x_valid, y_valid, params),

"MSW": MSW(params),

}

Using the above de ned function, store the obtained results for $\beta = 0, 0.25, 0.5,

0.75, \dots, 2$.

results = {

"MSE_train": list(),

"MSE_valid": list(),

"MSW": list(),

}

pen_values = np.arange(0, 2.1, 0.5)

for p in pen_values:

print("training for p = %f..." % p)

res = train(p)

results["MSE_train"].append(res["MSE_train"])

results["MSE_valid"].append(res["MSE_valid"])

results["MSW"].append(res["MSW"])

Plot the trend of the five quantities as functions of $\beta$.

_, axs = plt.subplots(1, 3, figsize=(12, 6))

axs[0].plot(pen_values, results["MSE_train"], "o-")

axs[1].plot(pen_values, results["MSE_valid"], "o-")

axs[2].plot(pen_values, results["MSW"], "o-")

axs[0].set_title("MSE_train")

axs[1].set_title("MSE_valid")

axs[2].set_title("MSW")

Plot the _Tikhonov L-curve_, which is - in this context - the curve "train MSE" versus

"MSW".

plt.plot(results["MSW"], results["MSE_train"], "o-")

plt.xlabel("MSW")

plt.ylabel("MSE_train")

# 1st order Training (optimization) methods

import numpy as np

import matplotlib.pyplot as plt

import time

import jax.numpy as jnp

import jax

Let us consider the following function

fi

$$

f(x) = e^{-\frac{x}{10}}\sin(x) + \frac{1}{10} \cos(\pi x)

$$

defined over the interval $[0, 10]$.

f = lambda x: np.sin(x) * np.exp(-0.1 * x) + 0.1 * np.cos(np.pi * x)

a, b = 0, 10

De ne a function `get_training_data` that returns a collection of `N` training sam

Dettagli
Publisher
A.A. 2024-2025
97 pagine
SSD Scienze matematiche e informatiche MAT/08 Analisi numerica

I contenuti di questa pagina costituiscono rielaborazioni personali del Publisher egiafra di informazioni apprese con la frequenza delle lezioni di Numerical analisys for machine learning e studio autonomo di eventuali libri di riferimento in preparazione dell'esame finale o della tesi. Non devono intendersi come materiale ufficiale dell'università Politecnico di Milano o del prof Miglio Edie.