vuoi
o PayPal
tutte le volte che vuoi
Trend is a long-term increase or decrease
Seasonality is regular, fixed fluctuations.
Cycle is irregular rises and falls, often linked to economic factors.
gg_season() > graph used to visualize time series data, plotting data against individual
season or cycle
gg_season(name, labels = "both") +
labs(y = "$ (millions)",
title = "....")
gg_season (Demand, period= “day/week/year”) +
theme(legend.position = "none") > toglie la leggenda, opzionale
It helps focus on one pattern at a time (weekly or yearly). Makes it easier to see trends and
changes for that specific seasonality.
gg_subseries() > creates subseries plots to visualize individual seasonal cycles over time,
highlighting variations across periods.
gg_subseries(name) The blue horizontal lines indicate the means for each month. This form
of plot enables the underlying seasonal pattern to be seen clearly, and
also shows the changes in seasonality over time. It is especially useful
in identifying changes within particular seasons.
gg_plot() > used to create plots by mapping data to visual elements.
…, …))
ggplot(data, aes(x = y = +
geom_point() # Add scatter points
ggplot(visitors, aes(x = Quarter, y = Trips)) +
geom_line() + # add line plots
facet_grid(vars(State), scales = "free_y")
geom_point(): For scatter plots.
geom_line(): For line plots.
geom_bar(): For bar charts.
geom_histogram(): For histograms.
cor() > calculates the correlation coefficient between two numerical variables
cor(name of the variable$name of column, vic_elec_2014$Demand)
0.85, it means that there is a strong positive correlation between Temperature and Demand
in the year 2014.
-0.4, it means that there is a weak negative correlation.
0.05, it means that there is almost no linear relationship between the variables.
ggpairs() > Creates a pairwise scatterplot matrix to visualize relationships between multiple
numeric variables in a dataset.
library(GGally)
ggpairs(data, columns, mapping, ...)
ggpairs(pivot_wider(dataset, values_from = Trips, names_from = State),
columns = 2:9)
ggpairs(data, columns = c(2, 3, 5))
Range:
+1 = Perfect positive correlation (as one increases, the other
increases).
-1 = Perfect negative correlation (as one increases, the other
decreases).
0 = No correlation.
Stars Indicate Significance:
*** = highly significant
** = moderately significant
* = weakly significant
gg_lag() > used to visualize the relationship between a time series variable and its lagged
versions (e.g., comparing the values of a variable with its previous values over time).
library(GGally)
gg_lag(name, variable/column, geom = "point/line")
If the series has a strong seasonal effect, the points should be very
close to the bisector
If your data contains daily or monthly data, you may want to extract
the quarterly information first (using functions like quarter() and
year() from lubridate)
…. <- filter(name, year(Quarter) >= 2000)
ACF() > computes the autocorrelation of a time series at different lags.
ACF(name, column name, lag_max = 9)
autoplot(ACF(recent_production, Beer))
The space between the two blue lines is called the confidence
interval
Autocorrelation values inside the confidence interval suggest no significant correlation at that lag.
Autocorrelation values outside the confidence interval suggest that the observed autocorrelation is
statistically significant and unlikely to be due to random noise
Spikes are due to seasonality
Per trovare la percentuale, si divide il numero di linee sopra e sotto la linea blu del grafico per il
numero totale di righe.
< 5%: the data is likely random, with no meaningful autocorrelation.
> 5% but not much: the result might be inconclusive, potentially due to random variations.
> 20%: significant autocorrelation, suggesting that there is a real, non-random relationship in the data
across those lags.
Time series that show no autocorrelation are called white noise. For white noise series, we expect
each autocorrelation to be close to zero < 5%. For a white noise series, we expect 95% of the spikes
in the ACF to lie within the blue lines
features() > designed to extract features (i.e., summary statistics or transformations) from a
time series, such as trends, seasonality, or optimal parameters for transformations.
features(aus_production, Gas, features = guerrero)
lambda <- pull(guer, lambda_guerrero)
print(lambda)
box_cox() > applies the Box-Cox transformation to the Gas time series using the λ value.
box_cox(aus_production$Gas, lambda)
or
autoplot(name, box_cox(Gas, lambda))
model() > allows you to create and compare multiple models on your dataset efficiently
model(name, model_name = model_function(Employed))
model(name, classical_decomposition(Employed, type = "additive/ multiplicative"))
Additive > for example, retail employment may have fixed seasonal peaks, but the size of the peak
doesn’t change with the level of employment.
Multiplicative > for example, if retail employment tends to grow over time and the seasonal peaks get
larger as the trend increases, a multiplicative model might be more appropriate.
components() > takes a decomposed time series object and extracts the individual
components of the decomposition, such as the trend, seasonal, and remainder components.
components (name)
Now, you can manually specify which component (Employed) to plot and add a trend line
autoplot(as_tsibble(components(dcmp)), Employed)
or
as_tsibble(components(dcmp))
autoplot(components_tsibble, Employed)
To add the orange line:
geom_line(aes(y=trend), colour = "blue/orange")
or you can automatically plot all components (trend, seasonal, remainder) in separate panels
autoplot(components(dcmp))
A longer bar means that the data contains significant random or
irregular fluctuations that are not explained by the trend or
seasonal components.
A shorter bar indicates that the decomposition model has
successfully captured most of the structure in the data.
slide_dbl() > designed for sliding window calculations over a numeric vector. It is particularly
useful for moving averages, rolling sums, and other calculations applied to subsets of data.
slide_dbl(.x, .f, ..., .before = 0, .after = 0, .complete = TRUE)
slide_dbl(data, mean, .before = 2, .after = 2, .complete = TRUE)
per linea arancione:
autoplot(aus_exports, Exports) +
geom_line(aes(y = `5-MA`), colour = "orange")
STL() > Seasonal and Trend decomposition using Loess is a powerful and flexible method
for decomposing a time series
model(name,
STL(Employed ~ trend(window = xxx)+season(window = xxx), robust = TRUE))
A larger window for the trend component helps capture long-term patterns by averaging a wider range
of values.
For seasonality, the window size often corresponds to the expected periodicity of seasonal
fluctuations (e.g., 7 for weekly seasonality in monthly data).
features() > A feature is any numerical summary of the data, such as the mean, median,
maximum, minimum, or others.
features(tourism, Trips, list(mean = mean))
This line tells R to calculate the mean of the Trips column from the tourism dataset.
arrange() > After calculating the mean, you might want to sort the result in ascending or
descending order. arrange allows you to do this by a specified column.
arrange(name, mean)
This line takes the result from features() (which includes the mean) and sorts it by the mean
values.
model() > build statistical models for time series data
model_fit <- model(data, MODEL_TYPE(y ~ x))
library(fable)
fit <- model(name, TSLM(Employed ~ trend()))
report(fit)
Supports different types of models, such as:
● ARIMA (Auto-Regressive Integrated Moving Average)
● ETS (Exponential Smoothing State Space Model)
● STL (Seasonal-Trend Decomposition)
● TSLM (Time Series Linear Model).
accuracy() > when we build a model, we need to check how well it predicts the data before
trusting it for forecasting
fit <- model(....)
accuracy(fit)
or
accuracy(beer_fc, recent_production)
select(accTable,.model,RMSE,MAE,MAPE)
Smaller values are better
# Example Output:
# ME RMSE MAE MPE MAPE
# -12.34 15.67 10.45 -1.2% 3.4%
MAE → Easy to understand and good for general errors.
RMSE → Highlights larger errors (use if big mistakes are more important).
MAPE → Best for percentage-based comparisons across datasets with different scales.
forecast() > used to predict future values based on a fitted time series model
forecast(name, h = 12)
forecast(fit, h = "3 years")
autoplot()
filter_index() > extract a subset of data for analysis or visualization.
filter_index(name, "1970 Q1" : "2004 Q4") # select data between first quarter of 1970 and
fourth quarter of 2004
"2010" | "2011" | "2012" # Extracts data for the years 2010, 2011, and 2012
model() > MEAN, NAIVE, SEASONAL NAIVE and DRIFT
mean_fit <- model(bricks, MEAN(Bricks))
tidy(mean_fit)
results_list <- mean_fit$”MEAN(Bricks)” [[1]]
mean_results <- results_list$fit
or all together like this: results_list <- mean_fit$”MEAN(Bricks)” [[1]] $fit
then we have to forecast and autoplot
mean_fc <- forecast(mean_fit, h = 12)
bricks_mean = mutate(bricks,hline = mean_fc$.mean[1]) # add a dashed line
autoplot(mean_fc, bricks, level = NULL) +
autolayer(bricks_mean,hline,linetype='dashed',color='blue')
naive_fit <- model(bricks,NAIVE(Bricks))
naive_fc <- forecast(naive_fit, h = 12)
autoplot(naive_fc, bricks, level = NULL)
For Naive forecasting, the forecast line will typically be flat,
meaning the forecasted value for each future point is the same as
the last observed value.
snaive_fit <- model(bricks,SNAIVE(Bricks ~ lag("year")))
snaive_fc <- forecast(snaive_fit, h = 12)
autoplot(snaive_fc, bricks, level = NULL)
This is useful for series with no trend but high seasonality
drift_fit <- model(bricks,RW(Bricks ~ drift()))
drift_fc <- forecast(drift_fit, h = 12)
autoplot(drift_fc, bricks, level = NULL)
This is useful for series with no seasonal effect but with a trend
We can do all together:
model(
Mean = MEAN(name),
Naive = NAIVE/RW(Beer),
Seasonal_naive = SNAIVE(Beer)
Drift = RW (Beer ˜ drift ())
)
augment() > Retrieves details about the model, including actual values, fitted values (.fitted)
and residuals (.resid).
augment(beer_fit1)
For visualisation:
ggplot(mean_fitted, aes(x = Quarter)) +
geom_line(aes(y = Beer),color='black') +
geom_line(aes(y = .fitted),color='red')
or
autoplot(mean_fitted,.vars = Beer) + #