Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
vuoi
o PayPal
tutte le volte che vuoi
"MPG",
"Cylinders",
"Displacement",
"Horsepower",
"Weight",
"Acceleration",
"Model Year",
"Origin",
]
data = pd.read_csv(
url, names=column_names, na_values="?", comment="\t", sep=" ",
skipinitialspace=True
)
data
Check is there are missing entries in the dataset.
print(data.isna().sum())
Remove records with missing entries.
data = data.dropna()
print(data.isna().sum())
## Data inspection
Display some basic information.
data.head()
data.info()
data.describe()
We are interested in predicting the eld `MPG`, measuring [fuel ef ciency](https://
en.wikipedia.org/wiki/
Fuel_ef ciency#:~:text=Fuel%20economy%20is%20the%20distance,a%20certain%
20volume%20of%20fuel)), expressed in miles per gallon (MPG), where 1 MPG =
0.354006 km/L. Plot its distribution.
sns.displot(data["MPG"], kde=True)
Look for linear correlations among data.
data.corr()
sns.heatmap(data.corr(), annot=True, cmap="vlag_r", vmin=-1, vmax=1)
sns.pairplot(data, diag_kind="kde")
## Data normalization
Apply an af ne transformation to the data, so that each feature has zero mean and
unitary standard deviation.
data_mean = data.mean()
data_std = data.std()
data_normalized = (data - data_mean) / data_std
_, ax = plt.subplots(figsize=(16, 6))
sns.violinplot(data=data_normalized, ax=ax)
fi fi fi fi
## Train-validation split
Shuf e the data using the [np.random.shuf e](https://numpy.org/doc/stable/
reference/random/generated/numpy.random.shuf e.html) function and split the data
as follows:
- put 80% in the train dataset
- put 20% in the validation dataset
data_normalized_np = data_normalized.to_numpy()
np.random.seed(0)
np.random.shuffle(data_normalized_np)
fraction_validation = 0.2
num_train = int(data_normalized_np.shape[0] * (1 - fraction_validation))
x_train = data_normalized_np[:num_train, 1:]
y_train = data_normalized_np[:num_train, :1]
x_valid = data_normalized_np[num_train:, 1:]
y_valid = data_normalized_np[num_train:, :1]
print("train set size : %d" % x_train.shape[0])
print("validation set size: %d" % x_valid.shape[0])
## ANN setup
Write a function `params = initialize_params(layers_size)` that initializes the
parameters, given the ANN architecture.
Initialize biases with zero values, and weights with a [Glorot Normal](http://
proceedings.mlr.press/v9/glorot10a/glorot10a.pdf) initialization, i.e. sampling from a
Gaussian distribution with zero mean and with standard deviation
$$
\sqrt{\frac{2}{n + m}},
$$
where $n$ and $m$ are the number of input and output neurons of the
corresponding weights matrix.
def initialize_params(layers_size):
np.random.seed(0) # for reproducibility
params = list()
for i in range(len(layers_size) - 1):
W = np.random.randn(layers_size[i + 1], layers_size[i]) * np.sqrt(
2 / (layers_size[i + 1] + layers_size[i])
)
b = np.zeros((layers_size[i + 1], 1))
params.append(W)
params.append(b)
return params
Implement a generic feedforward ANN with a function `y = ANN(x, params)`, using
$ReLU$ as activation function.
fl fl fl
activation = jax.nn.relu
def ANN(x, params):
layer = x.T
num_layers = int(len(params) / 2 + 1)
weights = params[0::2]
biases = params[1::2]
for i in range(num_layers - 1):
layer = weights[i] @ layer - biases[i]
if i < num_layers - 2:
layer = activation(layer)
return layer.T
params = initialize_params([7, 10, 1])
ANN(x_train[:10, :], params)
Implement the quadratic loss (MSE) function `L = MSE(x, y, params)`.
def MSE(x, y, params):
error = ANN(x, params) - y
return jnp.mean(error * error)
params = initialize_params([7, 10, 1])
print(MSE(x_train, y_train, params))
Implement an $l^2$ regularization term for the ANN weights:
$$
\mathrm{MSW} = \frac{1}{n_{weights}} \sum_{i=1}^{n_{weights}} w_i^2
$$
and define the loss function as
$$
\mathcal{L} = \mathrm{MSE} + \beta \, \mathrm{MSW}
$$
where $\beta$ is a suitable penalization parameter.
def MSW(params):
weights = params[::2]
partial_sum = 0.0
n_weights = 0
for W in weights:
partial_sum += jnp.sum(W * W)
n_weights += W.shape[0] * W.shape[1]
return partial_sum / n_weights
def loss(x, y, params, penalization):
return MSE(x, y, params) + penalization * MSW(params)
print(MSW(params))
print(loss(x_train, y_train, params, 1.0))
Run this cell: we will this callback to monitor training.
from IPython import display
class Callback:
def __init__(self, refresh_rate=250):
self.refresh_rate = refresh_rate
self.fig, self.axs = plt.subplots(1, figsize=(16, 8))
self.epoch = 0
self.__call__(-1)
def __call__(self, epoch):
self.epoch = epoch
if (epoch + 1) % self.refresh_rate == 0:
self.draw()
display.clear_output(wait=True)
display.display(plt.gcf())
time.sleep(1e-16)
def draw(self):
if self.epoch > 0:
self.axs.clear()
self.axs.loglog(history_loss_train, "b-", label="loss train")
self.axs.loglog(history_loss_valid, "r-", label="loss validation")
self.axs.loglog(history_MSE_train, "b--", label="RMSE train")
self.axs.loglog(history_MSE_valid, "r--", label="RMSE validation")
self.axs.legend()
self.axs.set_title("epoch %d" % (self.epoch + 1))
## Training
Train an ANN with two hidden layers with 20 neurons each, using 5000 epochs of
the SGD method (with minibatch size 100) with momentum ($\alpha = 0.9$).
Employ a linear decay of the learning rate:
$$
\lambda_k = \max\left(\lambda_{\textnormal{min}}, \lambda_{\textnormal{max}} \left(1
- \frac{k}{K}\right)\right)
$$
with $\lambda_{\textnormal{min}} = 5e-3$, $\lambda_{\textnormal{max}} = 1e-1$ and
decay length $K= 1000$.
During training, store both the MSE error and the loss function obtained on the train
and validation sets in 4 lists, respectively called:
- `history_loss_train`
- `history_loss_valid`
- `history_MSE_train`
- `history_MSE_valid`
Test different choices of the penalization parameter $\beta$.
# Hyperparameters
layers_size = [7, 20, 20, 1]
penalization = 2.0
# Training options
num_epochs = 5000
learning_rate_max = 1e-1
learning_rate_min = 5e-3
learning_rate_decay = 1000
batch_size = 100
alpha = 0.9
########################################
params = initialize_params(layers_size)
grad = jax.grad(loss, argnums=2)
MSE_jit = jax.jit(MSE)
loss_jit = jax.jit(loss)
grad_jit = jax.jit(grad)
n_samples = x_train.shape[0]
history_loss_train = list()
history_loss_valid = list()
history_MSE_train = list()
history_MSE_valid = list()
def dump():
history_loss_train.append(loss_jit(x_train, y_train, params, penalization))
history_loss_valid.append(loss_jit(x_valid, y_valid, params, penalization))
history_MSE_train.append(MSE_jit(x_train, y_train, params))
history_MSE_valid.append(MSE_jit(x_valid, y_valid, params))
dump()
cb = Callback(refresh_rate=500)
velocity = [0.0 for i in range(len(params))]
for epoch in range(num_epochs):
learning_rate = max(
learning_rate_min, learning_rate_max * (1 - epoch / learning_rate_decay)
)
idxs = np.random.choice(n_samples, batch_size)
grads = grad_jit(x_train[idxs, :], y_train[idxs, :], params, penalization)
for i in range(len(params)):
velocity[i] = alpha * velocity[i] - learning_rate * grads[i]
params[i] += velocity[i]
dump()
cb(epoch)
cb.draw()
print("loss (train ): %1.3e" % history_loss_train[-1])
print("loss (validation): %1.3e" % history_loss_valid[-1])
print("MSE (train ): %1.3e" % history_MSE_train[-1])
print("MSE (validation): %1.3e" % history_MSE_valid[-1])
We now want to to investigate more in depth the effect of the penalization parameter
$\beta$.
Write a function that, given the penalization parameter, trains the ANN (with the
same setting used above) and returns a dictionary containing the nal values of:
- train MSE
- validation MSE
- MSW
# Hyperparameters
layers_size = [7, 20, 20, 1]
# Training options
num_epochs = 5000
learning_rate_max = 1e-1
learning_rate_min = 5e-3
learning_rate_decay = 1000
batch_size = 100
alpha = 0.9
def train(penalization):
params = initialize_params(layers_size)
n_samples = x_train.shape[0]
velocity = [0.0 for i in range(len(params))]
for epoch in range(num_epochs):
learning_rate = max(
learning_rate_min, learning_rate_max * (1 - epoch / learning_rate_decay)
)
idxs = np.random.choice(n_samples, batch_size)
grads = grad_jit(x_train[idxs, :], y_train[idxs, :], params, penalization)
for i in range(len(params)): fi
velocity[i] = alpha * velocity[i] - learning_rate * grads[i]
params[i] += velocity[i]
return {
"MSE_train": MSE(x_train, y_train, params),
"MSE_valid": MSE(x_valid, y_valid, params),
"MSW": MSW(params),
}
Using the above de ned function, store the obtained results for $\beta = 0, 0.25, 0.5,
0.75, \dots, 2$.
results = {
"MSE_train": list(),
"MSE_valid": list(),
"MSW": list(),
}
pen_values = np.arange(0, 2.1, 0.5)
for p in pen_values:
print("training for p = %f..." % p)
res = train(p)
results["MSE_train"].append(res["MSE_train"])
results["MSE_valid"].append(res["MSE_valid"])
results["MSW"].append(res["MSW"])
Plot the trend of the five quantities as functions of $\beta$.
_, axs = plt.subplots(1, 3, figsize=(12, 6))
axs[0].plot(pen_values, results["MSE_train"], "o-")
axs[1].plot(pen_values, results["MSE_valid"], "o-")
axs[2].plot(pen_values, results["MSW"], "o-")
axs[0].set_title("MSE_train")
axs[1].set_title("MSE_valid")
axs[2].set_title("MSW")
Plot the _Tikhonov L-curve_, which is - in this context - the curve "train MSE" versus
"MSW".
plt.plot(results["MSW"], results["MSE_train"], "o-")
plt.xlabel("MSW")
plt.ylabel("MSE_train")
# 1st order Training (optimization) methods
import numpy as np
import matplotlib.pyplot as plt
import time
import jax.numpy as jnp
import jax
Let us consider the following function
fi
$$
f(x) = e^{-\frac{x}{10}}\sin(x) + \frac{1}{10} \cos(\pi x)
$$
defined over the interval $[0, 10]$.
f = lambda x: np.sin(x) * np.exp(-0.1 * x) + 0.1 * np.cos(np.pi * x)
a, b = 0, 10
De ne a function `get_training_data` that returns a collection of `N` training sam