# Methods

### GLEAM

This forecasting method combines digital surveillance data, historical ILI data and an epidemic stochastic generative model (GLEAM) to provide weekly probabilistic predictions for the influenza season. It consists of three stages:

#### Stage 1: Tracking seasonal influenza with social and digital data mining.

In the first stage we collect information on influenza-like-illness cases from two sources of geolocalized data:

- We mine Twitter’s gardenhose (a live stream of about 10% to 20% of the global volume) extracting ILI related tweets in the country of interest. This task is executed applying three filters to the raw data: geolocalization, language detection, and keyword match.

- We estimate the ILI activity by using online self-reporting platforms: Influenzanet in Europe and FluTracking in Australia. Voluntary participants provide information about their locations, demographic and influenza-related health status. The high quality and reliability of such data allow to extract the amount of ILI cases in the country of interest.

#### Stage 2: Model initialization and numerical simulations

In the second stage we evaluate the initial conditions necessary to run GLEAM and perform the model simulations exploring the phase space of the model’s parameters. We explore, via Latin-Hypercube sampling, a four-dimensional parameter space characterized by a season-dependent relation between the actual number of cases of flu and the buzz generated on Twitter, the residual immunity in the population, the transmissibility of the disease, and the recovery rate.

#### Stage 3: Model selection

In the final stage, we select the set of parameters that best describe the historical surveillance data in the current season. To select proper models from this set, we first compare the simulation results of each point in the phase space explored against real data. This step is performed using a likelihood ratio analysis.

### Time series forecasting with stochastic models

#### I. Autoregressive model (AR)

An AR model of order p, AR(p), assumes the future value of the forecasting variable is a linear combination of p past observations, a random error, and a constant term. For more detailed information, see here

#### II. Moving average model (MA)

A MA model of order q, MA(q), uses past errors as explanatory variables. The model assumes the next value of a variable of a time series is a linear regression of the current observation and of random shocks in q prior observations. The random shocks here are assumed to be a white noise. An MA(q) model can only be used for a stationary time series. For more details, see here

#### III. Autoregressive Moving average model (ARMA)

AR(p) model and MA(q) model can be linearly combined together to form a general model, ARMA(p, q) model, for univariate time series modeling. For more information, see here

#### IV. Autoregressive Integrated Moving average model (ARIMA)

Time series containing trend patterns might be non-stationary. An ARIMA(p, d, q) model, is a generalization of an ARMA model. The model can transform non-stationary time series to stationary ones by applying d levels of differencing of data points. For more information, see here.

#### V. Seasonal Autoregressive Integrated Moving average model (SARIMA)

The ARIMA model is for non-seasonal non-stationary time series. Usually, time series possess a seasonal pattern that repeats for every observation. To model seasonality, ARIMA model can be generalized as a SARIMA(p, d, q)x (P, D, Q)s model, which can be formulated as an ARIMA(p, d, q) model whose seasonal residuals are modeled as ARIMA(P, D, Q). For more details, see here.

#### VI. Model selection

To select the orders of the above models p, d, q, etc., we use Aikake Information Criterion (AIC) to select the best parameter configuration of the models, and with the best model we forecast the ILI data two weeks ahead.