Estimation and Prediction¶
This notebook walks you through building nowcasting models for GDP growth. The goal is for you to understand how each model works mechanically, how it is fit on training data, how predictions are generated, and what parameters you can tune.
The workflow is the same for every model:
- Split data into a training sample and a test sample
- Fit the model on the training data
- Predict on the test data
- Evaluate performance using RMSE
The helper functions used in the full nowcasting pipeline (fit_* and predict_*) are shown in the Example section below. Your task is to replicate the OLS exercise for the remaining models, using those functions as a blueprint.
OLS¶
The nowcasting pipeline uses a set of helper functions (one to fit each model and one to generate predictions). Understanding these functions lets you swap models, change the application of models to different variables, environments, or questions.
For OLS, the fit function wraps scikit-learn's LinearRegression:
ols_fit, ols_vars = fit_ols(yt, xt)
Internally, fit_ols does two things:
model = LinearRegression()
model.fit(xtrain, ytrain)
It returns the fitted model object and the list of variables used during training. The prediction function then applies model.predict(X) to new (test) data and extracts the scalar forecast for each period.
The exercises below strip away the pipeline bookkeeping so you can focus on the model mechanics directly. A simplified dataset is constructed first, and then you will fit each model by hand.
Data¶
The cell below loads the full dataset (processed in earlier sessions), applies CPI deflation, seasonal adjustment, and converts all series to growth rates. It then filters out variables with long publication lags and removes highly correlated series. The result is a clean panel ready for modelling.
Run this cell as-is. You do not need to modify it.
# =========================================================
# User settings (edit here as necessary)
# =========================================================
import pandas as pd
from datetime import datetime
# ---------------------------------------------------------
# 0) DATA/METADATA FILE NAMES
# ---------------------------------------------------------
data_file = "data.csv"
meta_file = "meta.csv"
# ---------------------------------------------------------
# 1) DEFINE TRAIN / TEST WINDOW
# ---------------------------------------------------------
# These dates define the sample splits used for model training and evaluation.
# - Training sample: [train_start_date, test_start_date)
# - Test sample: [test_start_date, test_end_date]
train_start_date = "2015-03-01"
test_start_date = "2023-06-01"
# Convert to Timestamp for safer comparisons later
train_start_date = pd.to_datetime(train_start_date)
test_start_date = pd.to_datetime(test_start_date)
# ---------------------------------------------------------
# 2) DEFINE OUT-OF-SAMPLE FORECAST TARGET (LAST DATE / VINTAGE)
# ---------------------------------------------------------
# This is the date for which you want to produce the final nowcast/forecast.
# Use the first day of the month (or the appropriate GDP reference month).
desired_date = pd.to_datetime("2026-03-01")
# ---------------------------------------------------------
# 3) DEFINE OUTPUT FILENAMES (timestamped)
# ---------------------------------------------------------
# Timestamp output files so each run creates a unique set of results.
today = datetime.today()
date_str = today.strftime("%Y%m%d")
forecast_file = f"{date_str} 3_nwcst Tab fcst.xlsx" # model forecasts / nowcasts
performance_file = f"{date_str} 3_nwcst Tab perf.xlsx" # model performance metrics
outofsample_file = f"{date_str} 3_nwcst Tab oos.xlsx" # full OOS prediction panel
# (Optional) If you want to resume / compare against a previous run,
# set last_performance to the filename of the prior performance workbook.
# Example: last_performance = "perf_20251207.xlsx"
last_performance = None
# ---------------------------------------------------------
# 4) MODEL SETTINGS
# ---------------------------------------------------------
# Target GDP series ID
target_variable = "RGDP0000"
# Prediction horizon (vintages) to evaluate:
# - negative values: backcasts (e.g., -1 = one period back)
# - 0: nowcast (current period)
# - positive values: forecasts (e.g., +1 = one period ahead)
lags = list(range(-2, 3))
# ---------------------------------------------------------
# 5) TARGET MODELS
# ---------------------------------------------------------
# Parameter: iterations in GBT, DT, RF
_iter_ = 10
# Parameter: LSTM n_models and train_episodes
_iter_lstm = 10
_epoch_lstm = 50
# =========================================================
# =========================================================
# =========================================================
# Nowcasting pipeline: imports, paths, data loading, and preprocessing
# =========================================================
# This section:
# 1) Imports all required libraries for the nowcasting toolkit (ML + time-series + LSTM)
# 2) Defines local folder paths for raw inputs, processed data, and outputs
# 3) Loads the full dataset and metadata
# 4) Deflates nominal variables using CPI (base year = 2018 average)
# 5) Seasonally adjusts series using trend extraction from seasonal decomposition
# 6) Converts all series to growth rates (percent change)
# 7) Cleans the final modeling dataset and prints library versions for reproducibility
# =========================================================
import os, json, time, warnings, re
from copy import deepcopy
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tqdm import tqdm
# -----------------------------
# Models / ML utilities
# -----------------------------
from sklearn import linear_model, tree
from sklearn.linear_model import (
LinearRegression, RidgeCV, ElasticNet, Lasso, Lars
)
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA
# Optional time-series tools (if used downstream)
import pmdarima as pm
# Deep learning (LSTM nowcasting)
import torch
from nowcast_lstm.LSTM import LSTM
# Statsmodels seasonal adjustment
from statsmodels.tsa.seasonal import seasonal_decompose
# Reduce noisy warnings from some estimators / decomposition
warnings.filterwarnings("ignore", category=UserWarning)
# =========================================================
# Paths
# =========================================================
# Define local project directories (relative to current working directory)
d = os.getcwd() + "/"
PATH_RAW = os.path.join(d, "..", "raw") # raw inputs (downloads / intermediate CSVs)
PATH_DATA = os.path.join(d, "..", "data") # processed modeling tables
# Output folder for model results (forecasts, performance tables)
PATH_OUTPUT = os.path.join(d, "..", "output/", f"{date_str}")
# In case it is run several times in a day:
# PATH_OUTPUT = os.path.join(d, "..", "output/", f"{date_str}_2")
# Create folder if it does not exist
os.makedirs(PATH_OUTPUT, exist_ok=True)
print("# =========================================================")
print("Key Directories:")
print(f"Raw inputs: {PATH_RAW}")
print(f"Modeling data: {PATH_DATA}")
print(f"Model outputs: {PATH_OUTPUT}")
print("# =========================================================")
# =========================================================
# Load data and metadata
# =========================================================
# Main modeling dataset (indexed by date)
data = pd.read_csv(
os.path.join(PATH_DATA, data_file ),
parse_dates=["date"],
index_col="date"
)
test_end_date = data[target_variable].dropna().index.max()
# Metadata describing each variable (frequency, nominal/real flags, etc.)
# NOTE: this uses a Windows-style path separator; works on Windows.
# For portability, consider: os.path.join(PATH_DATA, "meta_full.csv")
metadata = pd.read_csv(os.path.join(PATH_DATA, meta_file))
# Key column names and CPI series ID used for deflation
meta_col = "ticket" # variable identifier column in metadata
cpi = "RCPI0000" # CPI series used to deflate nominal variables
# =========================================================
# Deflate nominal variables using CPI (base year: 2019 average, STATINJA 2019=100)
# =========================================================
# Many series are nominal; deflating them puts everything in real terms.
# We compute a CPI base as the average CPI level in 2019 (STATINJA 2019=100 → base = 100).
base = data[cpi][data.index.year == 2019].mean() # beta11: base year 2019 (Jamaica official)
# Identify variables flagged as nominal in metadata
nominal = metadata.loc[metadata["nominal"] == 1][meta_col].values
# Deflate: Real_t = Nominal_t / CPI_t * CPI_base (corrected: was multiplying)
for column in data.columns:
if column in nominal:
data[column] = data[column] / data[cpi] * base # beta11: divide (deflate) not multiply
# Drop CPI
data.drop(columns=[cpi], inplace=True)
# Drop GDP components
data = data.drop(columns=[col for col in data.columns if 'RGDP' in col and col != 'RGDP0000'])
# =========================================================
# Seasonal adjustment (trend extraction)
# =========================================================
# We apply seasonal decomposition and keep the trend component.
# Test for seasonality in data
# Period depends on data frequency:
# quarterly: 4, monthly: 12, daily: 31 (approx monthly seasonality)
quarterly = metadata.loc[metadata["freq"].str.lower().str.contains("q")][meta_col].values
monthly = metadata.loc[metadata["freq"].str.lower().str.contains("m")][meta_col].values
already_sa = metadata.loc[metadata["sa"]==1][meta_col].values
# Drop columns with fewer than 24 non-NaN observations
data = data.loc[:, data.notna().sum() >= 24]
for column in data.columns:
if column in already_sa:
continue
series = data[column].dropna()
if column in quarterly:
decomp = seasonal_decompose(series, model='additive', extrapolate_trend='freq', period=4)
data[column] = data[column] - decomp.seasonal
else :
decomp = seasonal_decompose(series, model='additive', extrapolate_trend='freq', period=12)
data[column] = data[column] - decomp.seasonal
# =========================================================
# Convert levels to growth rates (percent change)
# =========================================================
# Quarterly series: quarter-over-quarter growth rate
# All other series: default percent change at their observed frequency
for column in data.columns:
if column in quarterly:
# Get non-NaN values with their original index
non_nan = data[column].dropna()
# Calculate percent change on non-NaN values only
pct_change_values = non_nan.pct_change(periods=1, fill_method=None) * 100
# Create a new series filled with NaN, then update with calculated values
result = pd.Series(np.nan, index=data.index)
result.loc[pct_change_values.index] = pct_change_values
data[column] = result
else:
# Store original NaN mask
original_was_nan = data[column].isna()
# For monthly/weekly/daily, calculate normally
pct_change_values = data[column].pct_change(periods=1, fill_method=None) * 100
# Set to NaN where previous value was NaN OR current original value was NaN
prev_was_nan = data[column].shift(1).isna()
pct_change_values[prev_was_nan | original_was_nan] = np.nan
data[column] = pct_change_values
data.replace([np.inf, -np.inf], np.nan, inplace=True)
# =========================================================
# Drop columns in quarterly with more than 75% missing values
# =========================================================
for column in [ col for col in data.columns if col not in quarterly ] :
missing_pct = data[data.index<=test_start_date][column].isna().sum() / len(data[data.index<=test_start_date][column])
if missing_pct > 0.75:
data.drop(column, axis=1, inplace=True)
# =========================================================
# Final cleaning
# =========================================================
# Replace infinities created by pct_change with NaN
data.replace([np.inf, -np.inf], np.nan, inplace=True)
# Drop CPI from predictor set (used only for deflation)
#data.drop(cpi, axis=1, inplace=True)
# Keep data from 2015 onward and remove fully-missing rows
data = data[~(data.index < train_start_date)].reset_index(drop=False)
data = data.set_index("date").dropna(how="all", axis=0)
metadata["months_lag"] = (
pd.to_numeric(metadata["months_lag"], errors="coerce")
.fillna(1)
.astype(int)
)
# Drop those columns
data = data.drop(columns=[col for col in data.columns if 'RGDP' in col and col != 'RGDP0000'])
for col in data.columns :
last = data[col].dropna().index.max().year
if last < 2024 :
data.drop(col, axis=1, inplace=True)
'''
# drop low hierarcy series
# Pattern: ends with 000x OR has x00x in last 4 digits
pattern = r'.*(\d00\d|000\d)$'
hierarchy = [col for col in data.columns if re.match(pattern, col)]
data = data[hierarchy]
'''
# Manage high-corr series from that proxy similar underlying DGP
def deduplicate_correlated(df, threshold=0.85, target=target_variable,
metadata=metadata, meta_col=meta_col):
"""Drop one variable from each highly-correlated pair.
Priority: (1) keep aggregate tickets (numeric suffix % 100 == 0);
(2) keep the variable with more non-null observations.
"""
import re
def _is_agg(ticket):
nums = re.findall(r'\d+', str(ticket))
return bool(nums) and int(nums[-1]) % 100 == 0
agg_set = set(metadata.loc[metadata[meta_col].apply(_is_agg), meta_col])
X = df.drop(columns=[target])
obs = X.notna().sum()
corr = X.corr().abs()
upper = corr.where(np.triu(np.ones(corr.shape), k=1).astype(bool))
to_drop = set()
for col in upper.columns:
if col in to_drop:
continue
peers = upper.index[upper[col] > threshold].tolist()
for peer in peers:
if peer in to_drop:
continue
col_agg = col in agg_set
peer_agg = peer in agg_set
if col_agg and not peer_agg:
to_drop.add(peer)
elif peer_agg and not col_agg:
to_drop.add(col)
break
elif obs[col] >= obs[peer]:
to_drop.add(peer)
else:
to_drop.add(col)
break
kept = [c for c in df.columns if c not in to_drop]
print(f"Removed {len(df.columns) - len(kept)} correlated variables "
f"(|r| > {threshold}). {len(kept)} remain.")
return df[kept]
data = deduplicate_correlated(data, threshold=0.98)
# Drop high-publication lag series
tickets_to_keep = metadata[metadata.months_lag <= 5]['ticket'].tolist()
# Keep only these columns (that exist in data)
data = data[[ col for col in data.columns if col in tickets_to_keep or "RGDP" in col ]]
data.to_csv(os.path.join(PATH_OUTPUT, "data_processed.csv"), index=True)
metadata = metadata[metadata.ticket.isin(data.columns)]
metadata.to_csv(os.path.join(PATH_OUTPUT, "meta_processed.csv"), index=True)
# =========================================================
# Reproducibility: print core library versions
# =========================================================
print("================== Library Versions =================")
import sklearn, numpy, pandas
print("scikit-learn:", sklearn.__version__)
print("numpy:", numpy.__version__)
print("pandas:", pandas.__version__)
# Data dimensions (observations and variables)
print("Data dimensions:", data.shape)
print(f"Training dates: {train_start_date.strftime('%Y-%m-%d')} to {(test_start_date - pd.DateOffset(days=1)).strftime('%Y-%m-%d')}")
T1 = data[data.index < test_start_date].shape[0]
T2 = data[(data.index >= test_start_date) & (data.index <= test_end_date)].shape[0]
print(f"Training periods: {T1} ({T1/(T1+T2) :.2%})")
print(f"Testing dates: {test_start_date.strftime('%Y-%m-%d')} to {test_end_date.strftime('%Y-%m-%d')}")
print(f"Testing periods: {T2} ({T2/(T1+T2) :.2%})")
# ========================================================= Key Directories: Raw inputs: c:\Users\guerr\Dropbox\02_Work\Consulting\2026_NWCST_CAR_Research-Workshop\JAM Workshop\workshop_code/..\raw Modeling data: c:\Users\guerr\Dropbox\02_Work\Consulting\2026_NWCST_CAR_Research-Workshop\JAM Workshop\workshop_code/..\data Model outputs: c:\Users\guerr\Dropbox\02_Work\Consulting\2026_NWCST_CAR_Research-Workshop\JAM Workshop\workshop_code/..\output/20260512 # ========================================================= Removed 173 correlated variables (|r| > 0.98). 687 remain. ================== Library Versions ================= scikit-learn: 1.8.0 numpy: 2.4.1 pandas: 2.3.3 Data dimensions: (134, 245) Training dates: 2015-03-01 to 2023-05-31 Training periods: 99 (76.15%) Testing dates: 2023-06-01 to 2025-12-01 Testing periods: 31 (23.85%)
Variable Selection¶
To keep the exercise tractable, we restrict the dataset to six predictors with clear economic intuition for Jamaican GDP: Google Trends indicator, NightTime Lights, the JAM stock market volume, Exports to the US, oil prices, and bank loans. Monthly series are aggregated to a quarterly frequency to match the GDP release schedule.
Feel free to experiment by swapping in other variables from the full dataset.
data = data[['RGDP0000', 'UGTR0000', "UNTL0000", 'REXC0002', 'XEMP0004', 'XOIL0000', 'MLOA0000']]
agg_map = metadata.set_index('ticket')['agg_method'].to_dict()
agg_funcs = {
col: (agg_map[col] if agg_map[col] in ['mean', 'last', 'sum'] else 'mean')
for col in data.columns if col in agg_map
}
data = data.resample('QE').agg(agg_funcs)
data = data[data.index>"2015-01-01"]
data.dropna(how='any', axis=0, inplace=True)
data.tail(5)
| RGDP0000 | UGTR0000 | UNTL0000 | REXC0002 | XEMP0004 | XOIL0000 | MLOA0000 | |
|---|---|---|---|---|---|---|---|
| date | |||||||
| 2024-12-31 | 1.691583 | -0.928915 | 0.374498 | 85.541713 | 5.940967 | 1.173205 | 1.089155 |
| 2025-03-31 | 1.086629 | -0.558565 | 2.875776 | 12.413470 | -6.006168 | -3.481808 | 0.724042 |
| 2025-06-30 | 0.255034 | -1.378025 | 1.072993 | -7.816600 | -3.490598 | 0.055859 | 1.003215 |
| 2025-09-30 | 0.350465 | 6.175978 | 1.142558 | 29.273677 | 11.943489 | -0.089192 | 0.966247 |
| 2025-12-31 | -7.272064 | -3.359553 | -17.157254 | -24.146770 | 8.959772 | -1.213730 | 1.277365 |
With this simplified dataset we now demonstrate the estimation workflow. The same steps (split, fit, predict, evaluate) will apply to (nearly) every other model you estimate below.
train = data[data.index < test_start_date ]
test = data[data.index >= test_start_date ]
X_train = train.drop('RGDP0000', axis=1)
y_train = train['RGDP0000']
X_test = test.drop('RGDP0000', axis=1)
y_obs = test['RGDP0000']
OLS Estimation¶
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)
LinearRegression()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
Once the model is fit, generate out-of-sample predictions on the test set:
pred = model.predict(X_test)
pred
array([ 0.09364087, -1.73781166, 1.22820084, -0.67165895, -0.23315402,
0.62003632, -0.42694809, 0.66078934, -0.46287158, 3.4027889 ,
-1.8475537 ])
Compare predicted values against the observed GDP growth rates to see where the model performs well and where it struggles:
Table = pd.DataFrame({'Observed': y_obs, 'OLS predicted': pred}, index=y_obs.index)
# Estimate how far off in percent terms is the predicted value from observed
Table['Percent Error'] = (Table['OLS predicted']/Table['Observed']) - 1
Table
| Observed | OLS predicted | Percent Error | |
|---|---|---|---|
| date | |||
| 2023-06-30 | -0.066758 | 0.093641 | -2.402688 |
| 2023-09-30 | 1.988200 | -1.737812 | -1.874063 |
| 2023-12-31 | -0.778219 | 1.228201 | -2.578220 |
| 2024-03-31 | -0.054704 | -0.671659 | 11.278039 |
| 2024-06-30 | -0.997453 | -0.233154 | -0.766251 |
| 2024-09-30 | -1.055556 | 0.620036 | -1.587403 |
| 2024-12-31 | 1.691583 | -0.426948 | -1.252396 |
| 2025-03-31 | 1.086629 | 0.660789 | -0.391891 |
| 2025-06-30 | 0.255034 | -0.462872 | -2.814944 |
| 2025-09-30 | 0.350465 | 3.402789 | 8.709343 |
| 2025-12-31 | -7.272064 | -1.847554 | -0.745938 |
Compute the Root Mean Squared Error (RMSE) as a single summary of forecast accuracy. Lower RMSE indicates better out-of-sample performance. Use this value to compare models later.
np.sqrt(np.mean((np.array(Table['Observed']) - np.array(Table['OLS predicted'])) ** 2))
np.float64(2.4426866060397976)
Machine Learning Models¶
Now it's your turn. Estimate each model below using the training data and evaluate it on the test data. Use the helper functions in the Example section as your reference for how each model is constructed internally.
For each model below:
- Fit the model on
X_train/y_train - Predict on
X_test - Build a comparison table of observed vs. predicted values (as done above for OLS)
- Compute RMSE and note how it compares to OLS
All models follow the same basic pattern:
model = SomeModel(param1=..., param2=...)
model.fit(X_train, y_train)
pred = model.predict(X_test)
rmse = np.sqrt(np.mean((y_obs - pred) ** 2))
Hints:
- All sklearn estimators share the
.fit()/.predict()interface — only the class name and parameters change. - For tree-based models (DT, RF, GBT), the helper functions average predictions across multiple independently fit models. You can simplify this to a single model for the exercise.
- For LSTM, you will need to use the
nowcast_lstmpackage. Look atfit_lstmandpredict_lstmcarefully — the data format is different from sklearn.
Helper Function Reference¶
(Collapse or Expand)
The code block below shows all the fit_* and predict_* functions used in the full nowcasting pipeline. Read through them carefully — they show exactly how each model is instantiated, what parameters it accepts, and how predictions are extracted. Use them as your blueprint for the exercises that follow.
# =============================================================
# OLS — Ordinary Least Squares
# =============================================================
def fit_ols(ytrain, xtrain, target_variable=target_variable):
model = LinearRegression()
return model.fit(xtrain, ytrain), xtrain.columns
def predict_ols(model, X, train_vars, date, pred_dict, l, target_variable=target_variable):
pred = model.predict(X[train_vars])[0]
pred_dict[l].append(pred)
return pred_dict
# =============================================================
# OLS-R — Ridge Regression (L2 penalty, cross-validated α)
# =============================================================
def fit_olsr(ytrain, xtrain, target_variable=target_variable,
alphas=[0.0001, 0.001, 0.01, 0.1, 1, 10, 20]):
model = RidgeCV(alphas=alphas)
return model.fit(xtrain, ytrain), xtrain.columns
def predict_olsr(model, X, train_vars, date, pred_dict, l, target_variable=target_variable):
pred = model.predict(X[train_vars])[0]
pred_dict[l].append(pred)
return pred_dict
# =============================================================
# LASSO — L1-penalized Regression (automatic variable selection)
# =============================================================
def fit_lasso(ytrain, xtrain, target_variable=target_variable, alpha=1e-5):
model = Lasso(alpha=alpha)
return model.fit(xtrain, ytrain), xtrain.columns
def predict_lasso(model, X, train_vars, date, pred_dict, l, target_variable=target_variable):
pred = model.predict(X[train_vars])[0]
pred_dict[l].append(pred)
return pred_dict
# =============================================================
# ENET — Elastic Net (L1 + L2 combined penalty)
# =============================================================
def fit_enet(ytrain, xtrain, target_variable=target_variable,
params={'alpha': 1e-5, 'l1_ratio': 0.25}):
model = ElasticNet(alpha=params['alpha'], l1_ratio=params['l1_ratio'])
return model.fit(xtrain, ytrain), xtrain.columns
def predict_enet(model, X, train_vars, date, pred_dict, l, target_variable=target_variable):
pred = model.predict(X[train_vars])[0]
pred_dict[l].append(pred)
return pred_dict
# =============================================================
# DT — Decision Tree (ensemble of ModelN independent trees)
# =============================================================
def fit_dt(ytrain, xtrain, target_variable=target_variable, ModelN=200):
models = []
for _ in range(ModelN):
model = DecisionTreeRegressor(
criterion='absolute_error',
min_samples_split=6,
min_samples_leaf=2,
)
model.fit(xtrain, ytrain)
models.append(model)
return models, xtrain.columns
def predict_dt(model, X, train_vars, date, pred_dict, l, target_variable=target_variable):
preds = [m.predict(X[train_vars])[0] for m in model]
pred_dict[l].append(np.nanmean(preds))
return pred_dict
# =============================================================
# RF — Random Forest (ensemble of ModelN independent forests)
# =============================================================
def fit_rf(ytrain, xtrain, target_variable=target_variable,
ModelN=200, n_estimators=130):
models = []
for _ in range(ModelN):
model = RandomForestRegressor(
n_estimators=n_estimators,
criterion='absolute_error',
max_features=18,
min_samples_split=4,
min_samples_leaf=2,
)
model.fit(xtrain, ytrain)
models.append(model)
return models, xtrain.columns
def predict_rf(model, X, train_vars, date, pred_dict, l, target_variable=target_variable):
preds = [m.predict(X[train_vars])[0] for m in model]
pred_dict[l].append(np.nanmean(preds))
return pred_dict
# =============================================================
# GBT — Gradient Boosting Tree (sequential ensemble)
# =============================================================
def fit_gbt(ytrain, xtrain, target_variable=target_variable,
ModelN=200, n_estimators=100, learning=0.15):
models = []
for _ in range(ModelN):
model = GradientBoostingRegressor(
n_estimators=n_estimators,
learning_rate=learning,
loss='absolute_error',
min_samples_split=6,
min_samples_leaf=3,
)
model.fit(xtrain, ytrain)
models.append(model)
return models, xtrain.columns
def predict_gbt(model, X, train_vars, date, pred_dict, l, target_variable=target_variable):
preds = [m.predict(X[train_vars])[0] for m in model]
pred_dict[l].append(np.nanmean(preds))
return pred_dict
# =============================================================
# LSTM — Long Short-Term Memory (recurrent neural network)
# =============================================================
def fit_lstm(ttrain, gdp_lags=0, target_variable=target_variable,
params={
"n_timesteps": 12,
"fill_na_func": np.nanmean,
"fill_ragged_edges_func": np.nanmean,
"n_models": 100,
"train_episodes": 100,
"batch_size": 50,
"decay": 0.98,
"n_hidden": 10,
"n_layers": 1,
"dropout": 0.0,
"criterion": torch.nn.MSELoss(),
"optimizer": torch.optim.Adam,
"optimizer_parameters": {"lr": 1e-2, "weight_decay": 0.0},
}):
train = lagged_target(ttrain, gdp_lags)
model = LSTM(data=train, target_variable=target_variable, **params)
model.train(quiet=True)
return model, train.drop(["date", target_variable], axis=1).columns
def predict_lstm(model, X, train_vars, date, pred_dict, l,
gdp_lags=0, target_variable=target_variable):
X = lagged_target(X, gdp_lags)
pred = model.predict(X).loc[lambda x: x.date == date, "predictions"].values[0]
pred_dict[l].append(pred)
return pred_dict
LASSO¶
LASSO (Least Absolute Shrinkage and Selection Operator) adds an L1 penalty to the OLS objective. Unlike Ridge, the L1 penalty tends to push small coefficients to exactly zero, effectively performing automatic variable selection. This is useful when only a subset of the many available predictors are truly informative.
Key parameter:
| Parameter | Default | Effect |
|---|---|---|
alpha |
1e-5 |
Strength of the L1 penalty. Higher alpha → more coefficients shrunk to zero → sparser, simpler model. Lower alpha → fewer variables dropped, approaches plain OLS. |
Setting alpha too high will zero out too many predictors and underfit; too low and regularization has no practical effect. Try values across several orders of magnitude (e.g., 1e-4, 1e-3, 0.01, 0.1) to see how the model changes.
To implement: use Lasso(alpha=...), then .fit() and .predict(). Inspect .coef_ to see which coefficients were zeroed out.
from sklearn.linear_model import Lasso
model_lasso = Lasso(alpha=1e-5)
model_lasso.fit(X_train, y_train)
pred_lasso = model_lasso.predict(X_test)
Table_lasso = pd.DataFrame({'Observed': y_obs, 'LASSO predicted': pred_lasso}, index=y_obs.index)
Table_lasso['Percent Error'] = (Table_lasso['LASSO predicted'] / Table_lasso['Observed']) - 1
rmse_lasso = np.sqrt(np.mean((Table_lasso['Observed'] - Table_lasso['LASSO predicted']) ** 2))
print(f"LASSO RMSE : {rmse_lasso:.4f}")
# Inspect which coefficients LASSO set to zero (variable selection)
coef_lasso = pd.Series(model_lasso.coef_, index=X_train.columns)
print(f"Coefficients set to zero: {(coef_lasso == 0).sum()} / {len(coef_lasso)}")
Table_lasso
LASSO RMSE : 2.4427 Coefficients set to zero: 0 / 6
| Observed | LASSO predicted | Percent Error | |
|---|---|---|---|
| date | |||
| 2023-06-30 | -0.066758 | 0.093648 | -2.402792 |
| 2023-09-30 | 1.988200 | -1.737806 | -1.874060 |
| 2023-12-31 | -0.778219 | 1.228207 | -2.578228 |
| 2024-03-31 | -0.054704 | -0.671645 | 11.277789 |
| 2024-06-30 | -0.997453 | -0.233140 | -0.766264 |
| 2024-09-30 | -1.055556 | 0.620043 | -1.587409 |
| 2024-12-31 | 1.691583 | -0.426943 | -1.252392 |
| 2025-03-31 | 1.086629 | 0.660796 | -0.391884 |
| 2025-06-30 | 0.255034 | -0.462865 | -2.814916 |
| 2025-09-30 | 0.350465 | 3.402779 | 8.709314 |
| 2025-12-31 | -7.272064 | -1.847534 | -0.745941 |
OLS Ridge (Ridge Regression)¶
Ridge regression adds an L2 penalty to the OLS objective, which shrinks all coefficients toward zero without setting any of them exactly to zero. It retains all predictors but imposes a cost on large coefficients, trading off some bias for lower variance. It is particularly effective when many predictors each contribute small but non-zero effects (as is typical in macro nowcasting with many correlated indicators).
The helper function uses RidgeCV, which selects the optimal penalty strength automatically via cross-validation over a candidate grid.
Key parameter:
| Parameter | Default | Effect |
|---|---|---|
alphas |
[0.0001, 0.001, 0.01, 0.1, 1, 10, 20] |
Grid of penalty strengths to search over. Larger candidates impose stronger shrinkage; smaller candidates approach plain OLS. A wider or denser grid gives a more thorough search at the cost of computation. |
To implement: use RidgeCV(alphas=[...]), then .fit() and .predict() as before. The fitted model's .alpha_ attribute shows which penalty was chosen by cross-validation.
from sklearn.linear_model import RidgeCV
model_ridge = RidgeCV(alphas=[0.0001, 0.001, 0.01, 0.1, 1, 10, 20])
model_ridge.fit(X_train, y_train)
pred_ridge = model_ridge.predict(X_test)
Table_ridge = pd.DataFrame({'Observed': y_obs, 'Ridge predicted': pred_ridge}, index=y_obs.index)
Table_ridge['Percent Error'] = (Table_ridge['Ridge predicted'] / Table_ridge['Observed']) - 1
rmse_ridge = np.sqrt(np.mean((Table_ridge['Observed'] - Table_ridge['Ridge predicted']) ** 2))
print(f"Ridge RMSE : {rmse_ridge:.4f}")
print(f"Selected alpha: {model_ridge.alpha_}") # penalty chosen by cross-validation
Table_ridge
Ridge RMSE : 2.4538 Selected alpha: 20.0
| Observed | Ridge predicted | Percent Error | |
|---|---|---|---|
| date | |||
| 2023-06-30 | -0.066758 | 0.046555 | -1.697362 |
| 2023-09-30 | 1.988200 | -1.517953 | -1.763481 |
| 2023-12-31 | -0.778219 | 1.305987 | -2.678175 |
| 2024-03-31 | -0.054704 | -0.455791 | 7.331936 |
| 2024-06-30 | -0.997453 | -0.036049 | -0.963859 |
| 2024-09-30 | -1.055556 | 0.737153 | -1.698355 |
| 2024-12-31 | 1.691583 | -0.296833 | -1.175476 |
| 2025-03-31 | 1.086629 | 0.766231 | -0.294855 |
| 2025-06-30 | 0.255034 | -0.312660 | -2.225956 |
| 2025-09-30 | 0.350465 | 3.036095 | 7.663038 |
| 2025-12-31 | -7.272064 | -1.467452 | -0.798207 |
Elastic Net (ENET)¶
Elastic Net combines L1 (LASSO) and L2 (Ridge) penalties into a single regularization term. The L1 component performs variable selection; the L2 component stabilizes estimation when predictors are highly correlated with each other (Ridge tends to keep correlated variables together rather than arbitrarily dropping one). Elastic Net is a flexible middle ground: it can be sparse like LASSO while remaining stable like Ridge.
Key parameters:
| Parameter | Default | Effect |
|---|---|---|
alpha |
1e-5 |
Overall regularization strength. Higher → more shrinkage toward zero for all coefficients. |
l1_ratio |
0.25 |
Mix between L1 and L2 penalties. l1_ratio=1 → pure LASSO (sparse); l1_ratio=0 → pure Ridge (all variables retained). Values between blend both. |
Increasing l1_ratio produces sparser models; decreasing it produces smoother, more stable estimates when predictors are collinear. A good starting point is to vary l1_ratio across [0.1, 0.25, 0.5, 0.75, 0.9] to understand how the penalty mix affects which variables survive.
To implement: use ElasticNet(alpha=..., l1_ratio=...), then .fit() and .predict().
from sklearn.linear_model import ElasticNet
model_enet = ElasticNet(alpha=1e-5, l1_ratio=0.25)
model_enet.fit(X_train, y_train)
pred_enet = model_enet.predict(X_test)
Table_enet = pd.DataFrame({'Observed': y_obs, 'ENET predicted': pred_enet}, index=y_obs.index)
Table_enet['Percent Error'] = (Table_enet['ENET predicted'] / Table_enet['Observed']) - 1
rmse_enet = np.sqrt(np.mean((Table_enet['Observed'] - Table_enet['ENET predicted']) ** 2))
print(f"ENET RMSE : {rmse_enet:.4f}")
# l1_ratio=0.25 means 25% LASSO + 75% Ridge penalty mix
coef_enet = pd.Series(model_enet.coef_, index=X_train.columns)
print(f"Coefficients set to zero: {(coef_enet == 0).sum()} / {len(coef_enet)}")
Table_enet
ENET RMSE : 2.4427 Coefficients set to zero: 0 / 6
| Observed | ENET predicted | Percent Error | |
|---|---|---|---|
| date | |||
| 2023-06-30 | -0.066758 | 0.093644 | -2.402729 |
| 2023-09-30 | 1.988200 | -1.737808 | -1.874061 |
| 2023-12-31 | -0.778219 | 1.228204 | -2.578225 |
| 2024-03-31 | -0.054704 | -0.671651 | 11.277895 |
| 2024-06-30 | -0.997453 | -0.233147 | -0.766258 |
| 2024-09-30 | -1.055556 | 0.620041 | -1.587407 |
| 2024-12-31 | 1.691583 | -0.426945 | -1.252394 |
| 2025-03-31 | 1.086629 | 0.660794 | -0.391887 |
| 2025-06-30 | 0.255034 | -0.462867 | -2.814927 |
| 2025-09-30 | 0.350465 | 3.402782 | 8.709324 |
| 2025-12-31 | -7.272064 | -1.847543 | -0.745940 |
Decision Tree (DT)¶
A decision tree recursively partitions the feature space into rectangular regions and predicts the mean outcome within each region. It is non-linear and can capture interactions between predictors without any explicit specification. However, a single tree tends to overfit (it memorizes the training data rather than learning generalizable patterns). The helper function addresses this by averaging predictions across ModelN independently trained trees (a simple ensemble).
Key parameters:
| Parameter | Default | Effect |
|---|---|---|
ModelN |
200 |
Number of independently fit trees to average. Higher → more stable predictions; lower → faster but noisier. |
min_samples_split |
6 |
Minimum number of observations required to split an internal node. Higher → shallower trees, less overfitting. |
min_samples_leaf |
2 |
Minimum number of observations required in a leaf node. Higher → simpler trees, more regularization. |
criterion |
"absolute_error" |
Loss function used to evaluate candidate splits. "absolute_error" (MAE-based) is more robust to outliers than "squared_error" (MSE-based). |
Increasing min_samples_split and min_samples_leaf prevents trees from fitting to noise. For this exercise, you can start with a single tree (ModelN=1) and then try averaging a few.
To implement: use DecisionTreeRegressor(criterion=..., min_samples_split=..., min_samples_leaf=...), then .fit() and .predict().
from sklearn.tree import DecisionTreeRegressor
model_dt = DecisionTreeRegressor(
criterion='absolute_error',
min_samples_split=6,
min_samples_leaf=2,
)
model_dt.fit(X_train, y_train)
pred_dt = model_dt.predict(X_test)
Table_dt = pd.DataFrame({'Observed': y_obs, 'DT predicted': pred_dt}, index=y_obs.index)
Table_dt['Percent Error'] = (Table_dt['DT predicted'] / Table_dt['Observed']) - 1
rmse_dt = np.sqrt(np.mean((Table_dt['Observed'] - Table_dt['DT predicted']) ** 2))
print(f"DT RMSE : {rmse_dt:.4f}")
Table_dt
DT RMSE : 2.9732
| Observed | DT predicted | Percent Error | |
|---|---|---|---|
| date | |||
| 2023-06-30 | -0.066758 | 1.917136 | -29.717618 |
| 2023-09-30 | 1.988200 | 0.268370 | -0.865019 |
| 2023-12-31 | -0.778219 | 1.917136 | -3.463491 |
| 2024-03-31 | -0.054704 | 1.917136 | -36.045561 |
| 2024-06-30 | -0.997453 | 0.481795 | -1.483026 |
| 2024-09-30 | -1.055556 | 1.917136 | -2.816233 |
| 2024-12-31 | 1.691583 | 1.917136 | 0.133338 |
| 2025-03-31 | 1.086629 | 1.917136 | 0.764296 |
| 2025-06-30 | 0.255034 | 0.376716 | 0.477125 |
| 2025-09-30 | 0.350465 | 1.917136 | 4.470256 |
| 2025-12-31 | -7.272064 | 0.787645 | -1.108311 |
Random Forest (RF)¶
Random Forest builds an ensemble of decision trees where each tree is trained on a bootstrap sample of the data and considers only a random subset of features at each split. This double randomization de-correlates the individual trees, making the ensemble far more robust to overfitting than any single tree. Predictions are averaged across all trees in the forest. The helper function goes one step further by averaging across ModelN independently trained forests.
Key parameters:
| Parameter | Default | Effect |
|---|---|---|
ModelN |
200 |
Number of independently fit forests to average. More → more stable predictions, slower. |
n_estimators |
130 |
Number of trees per forest. More trees → lower variance within each forest; beyond ~200 the gain is marginal. |
max_features |
18 |
Number of features randomly considered at each split. Lower → more diversity across trees (less correlated predictions), at the cost of potentially weaker individual trees. |
min_samples_split |
4 |
Minimum observations to split a node. Higher → shallower, more regularized trees. |
min_samples_leaf |
2 |
Minimum observations in a leaf. Higher → simpler trees. |
The most impactful parameters are n_estimators (diminishing returns above ~100) and max_features (the primary source of tree diversity). For the exercise, a single RandomForestRegressor with moderate n_estimators is sufficient.
To implement: use RandomForestRegressor(n_estimators=..., max_features=..., ...), then .fit() and .predict().
from sklearn.ensemble import RandomForestRegressor
model_rf = RandomForestRegressor(
n_estimators=100,
criterion='absolute_error',
min_samples_split=4,
min_samples_leaf=2,
)
model_rf.fit(X_train, y_train)
pred_rf = model_rf.predict(X_test)
Table_rf = pd.DataFrame({'Observed': y_obs, 'RF predicted': pred_rf}, index=y_obs.index)
Table_rf['Percent Error'] = (Table_rf['RF predicted'] / Table_rf['Observed']) - 1
rmse_rf = np.sqrt(np.mean((Table_rf['Observed'] - Table_rf['RF predicted']) ** 2))
print(f"RF RMSE : {rmse_rf:.4f}")
# Feature importances: how much each predictor contributed to splits
importances_rf = pd.Series(model_rf.feature_importances_, index=X_train.columns).sort_values(ascending=False)
print(f"\nFeature importances:\n{importances_rf.round(3)}")
Table_rf
RF RMSE : 2.5753 Feature importances: UGTR0000 0.225 XOIL0000 0.195 XEMP0004 0.183 MLOA0000 0.179 REXC0002 0.111 UNTL0000 0.106 dtype: float64
| Observed | RF predicted | Percent Error | |
|---|---|---|---|
| date | |||
| 2023-06-30 | -0.066758 | 0.509454 | -8.631331 |
| 2023-09-30 | 1.988200 | 0.412047 | -0.792754 |
| 2023-12-31 | -0.778219 | 1.345986 | -2.729572 |
| 2024-03-31 | -0.054704 | -0.395030 | 6.221216 |
| 2024-06-30 | -0.997453 | 0.430901 | -1.432002 |
| 2024-09-30 | -1.055556 | -0.382904 | -0.637249 |
| 2024-12-31 | 1.691583 | 1.159510 | -0.314541 |
| 2025-03-31 | 1.086629 | 1.340008 | 0.233179 |
| 2025-06-30 | 0.255034 | 0.524117 | 1.055091 |
| 2025-09-30 | 0.350465 | 2.285074 | 5.520113 |
| 2025-12-31 | -7.272064 | 0.399425 | -1.054926 |
Gradient Boosting Tree (GBT)¶
Gradient Boosting builds an ensemble of trees sequentially: each new tree is fit to the residual errors of the current ensemble, so the model corrects its own mistakes stage by stage. Unlike Random Forest (which averages independent trees), GBT fits trees in an additive, stage-wise manner and tends to achieve better accuracy (but it is also prone to overfitting if not properly regularized). The helper function averages predictions across ModelN independently trained boosted ensembles.
Key parameters:
| Parameter | Default | Effect |
|---|---|---|
ModelN |
200 |
Number of independently fit GBT ensembles to average. More → more stable, slower. |
n_estimators |
100 |
Number of boosting stages (sequential trees). More stages → lower training error but higher risk of overfitting. |
learning_rate |
0.15 |
Shrinkage applied to each tree's contribution. Lower → each tree contributes less, requiring more stages; acts as regularization. Typical values: 0.01–0.2. |
min_samples_split |
6 |
Minimum observations to split a node. Higher → shallower trees, less overfitting. |
min_samples_leaf |
3 |
Minimum observations in a leaf. Higher → simpler trees. |
There is a classic trade-off between learning_rate and n_estimators: lower learning rates require more boosting stages to achieve good fit, but they generalize better. A good rule of thumb is to lower the learning rate and increase n_estimators proportionally.
To implement: use GradientBoostingRegressor(n_estimators=..., learning_rate=..., ...), then .fit() and .predict().
from sklearn.ensemble import GradientBoostingRegressor
model_gbt = GradientBoostingRegressor(
n_estimators=100,
learning_rate=0.15,
loss='absolute_error',
min_samples_split=6,
min_samples_leaf=3,
)
model_gbt.fit(X_train, y_train)
pred_gbt = model_gbt.predict(X_test)
Table_gbt = pd.DataFrame({'Observed': y_obs, 'GBT predicted': pred_gbt}, index=y_obs.index)
Table_gbt['Percent Error'] = (Table_gbt['GBT predicted'] / Table_gbt['Observed']) - 1
rmse_gbt = np.sqrt(np.mean((Table_gbt['Observed'] - Table_gbt['GBT predicted']) ** 2))
print(f"GBT RMSE : {rmse_gbt:.4f}")
# Feature importances: accumulated gain from splits on each predictor
importances_gbt = pd.Series(model_gbt.feature_importances_, index=X_train.columns).sort_values(ascending=False)
print(f"\nFeature importances:\n{importances_gbt.round(3)}")
Table_gbt
GBT RMSE : 2.4270 Feature importances: XOIL0000 0.241 MLOA0000 0.236 UGTR0000 0.210 XEMP0004 0.159 UNTL0000 0.108 REXC0002 0.046 dtype: float64
| Observed | GBT predicted | Percent Error | |
|---|---|---|---|
| date | |||
| 2023-06-30 | -0.066758 | 1.254383 | -19.789948 |
| 2023-09-30 | 1.988200 | 1.140788 | -0.426221 |
| 2023-12-31 | -0.778219 | 0.971018 | -2.247744 |
| 2024-03-31 | -0.054704 | 1.280807 | -24.413364 |
| 2024-06-30 | -0.997453 | 0.080192 | -1.080396 |
| 2024-09-30 | -1.055556 | 1.270263 | -2.203406 |
| 2024-12-31 | 1.691583 | 0.452253 | -0.732645 |
| 2025-03-31 | 1.086629 | 1.297433 | 0.193999 |
| 2025-06-30 | 0.255034 | 0.488184 | 0.914195 |
| 2025-09-30 | 0.350465 | 0.752408 | 1.146883 |
| 2025-12-31 | -7.272064 | -0.263602 | -0.963751 |
Long Short-Term Memory (LSTM)¶
LSTM is a type of recurrent neural network designed to capture temporal dependencies across sequential data. Unlike the regression models above, the LSTM processes a window of past observations at each time step and learns which historical patterns are predictive of GDP growth. The nowcast_lstm package handles the ragged-edge structure of real-time data, where different indicators are released at different times within a quarter.
Key parameters:
| Parameter | Default | Effect |
|---|---|---|
n_models |
100 |
Number of LSTM models trained and averaged (ensemble). More → more stable predictions, much slower training. |
train_episodes |
100 |
Training epochs per model. More → better convergence; too many can overfit on small samples. |
n_timesteps |
12 |
Length of the historical window (periods) fed to the LSTM at each step. Longer → captures more temporal dynamics; requires more data to estimate reliably. |
n_hidden |
10 |
Number of hidden units in the LSTM cell. More → higher model capacity, higher overfitting risk on small samples. |
n_layers |
1 |
Number of stacked LSTM layers. Deeper → can learn more complex dynamics; harder to train with short time series. |
learning_rate (lr) |
1e-2 |
Step size for the Adam optimizer. Too high → unstable training; too low → slow convergence. |
decay |
0.98 |
Learning rate decay per epoch. Values near 1 → very slow decay; smaller values → faster decay. |
dropout |
0.0 |
Fraction of hidden units randomly dropped during training. Useful regularization when n_hidden is large. |
batch_size |
50 |
Samples per gradient update. Smaller → noisier but often better-generalizing gradients. |
For small macro datasets like this one, keep n_hidden and n_layers small to avoid overfitting. The most impactful parameters to experiment with are n_models, train_episodes, and n_timesteps.
Note: the LSTM expects data in a specific format with a date column. Study fit_lstm and predict_lstm in the helper reference above before implementing.
# ── 1. Prepare data ───────────────────────────────────────────────────────────
# LSTM expects 'date' as a column, not an index.
train_lstm = train.reset_index() # columns: date, RGDP0000, ...
test_lstm = test.reset_index()
# LSTM expects full data. Not just test subset.
full_lstm = pd.concat([train_lstm, test_lstm]).reset_index(drop=True)
# ── 2. Parameters ─────────────────────────────────────────────────────────────
# n_timesteps=4 → 1 year of quarterly history per sequence (appropriate for
# short macro panels; use 8–12 with longer monthly datasets).
# n_models/train_episodes are kept small here for exercise speed.
# In production (full pipeline) use n_models=50–100, train_episodes=100+.
lstm_params = {
"n_timesteps" : 4,
"fill_na_func" : np.nanmean,
"fill_ragged_edges_func" : np.nanmean,
"n_models" : 5,
"train_episodes" : 50,
"batch_size" : 30,
"decay" : 0.98,
"n_hidden" : 8,
"n_layers" : 1,
"dropout" : 0.0,
"criterion" : torch.nn.MSELoss(),
"optimizer" : torch.optim.Adam,
"optimizer_parameters" : {"lr": 1e-2, "weight_decay": 0.0},
}
# ── 3. Fit ────────────────────────────────────────────────────────────────────
lstm_model = LSTM(data=train_lstm, target_variable=target_variable, **lstm_params)
lstm_model.train(quiet=False)
# ── 4. Predict ────────────────────────────────────────────────────────────────
# Pass the full dataset so the model can generate predictions for every date.
# Then filter to the test window.
preds_df = lstm_model.predict(full_lstm)
test_preds = (preds_df[preds_df["date"].isin(test_lstm["date"])]
.set_index("date")["predictions"]
.values)
# ── 5. Evaluate ───────────────────────────────────────────────────────────────
Table_lstm = pd.DataFrame(
{"Observed": y_obs, "LSTM predicted": test_preds},
index=y_obs.index,
)
Table_lstm["Percent Error"] = (Table_lstm["LSTM predicted"] / Table_lstm["Observed"]) - 1
rmse_lstm = np.sqrt(np.mean((y_obs.values - test_preds) ** 2))
print(f"LSTM RMSE : {rmse_lstm:.4f}")
Table_lstm
Training model 1 step : 0 loss : 14.040483474731445 step : 1 loss : 13.783120155334473 step : 2 loss : 13.53753662109375 step : 3 loss : 13.312020301818848 step : 4 loss : 13.111226081848145 step : 5 loss : 12.9336576461792 step : 6 loss : 12.77358627319336 step : 7 loss : 12.623543739318848 step : 8 loss : 12.47849178314209 step : 9 loss : 12.335400581359863 step : 10 loss : 12.192242622375488 step : 11 loss : 12.047696113586426 step : 12 loss : 11.854911804199219 step : 13 loss : 11.703693389892578 step : 14 loss : 11.55160903930664 step : 15 loss : 11.399922370910645 step : 16 loss : 11.249380111694336 step : 17 loss : 11.100813865661621 step : 18 loss : 10.95447063446045 step : 19 loss : 10.809059143066406 step : 20 loss : 10.662778854370117 step : 21 loss : 10.514568328857422 step : 22 loss : 10.364129066467285 step : 23 loss : 10.21234130859375 step : 24 loss : 10.061378479003906 step : 25 loss : 9.913403511047363 step : 26 loss : 9.769612312316895 step : 27 loss : 9.630471229553223 step : 28 loss : 9.495656967163086 step : 29 loss : 9.3643217086792 step : 30 loss : 9.235066413879395 step : 31 loss : 9.106490135192871 step : 32 loss : 8.977612495422363 step : 33 loss : 8.847843170166016 step : 34 loss : 8.71701717376709 step : 35 loss : 8.585759162902832 step : 36 loss : 8.465209007263184 step : 37 loss : 8.32946491241455 step : 38 loss : 8.208365440368652 step : 39 loss : 8.093409538269043 step : 40 loss : 7.984160423278809 step : 41 loss : 7.879317760467529 step : 42 loss : 7.777919292449951 step : 43 loss : 7.680274963378906 step : 44 loss : 7.58624792098999 step : 45 loss : 7.494492530822754 step : 46 loss : 7.404337406158447 step : 47 loss : 7.3175835609436035 step : 48 loss : 7.2332940101623535 step : 49 loss : 7.150623798370361 Training model 2 step : 0 loss : 14.525810241699219 step : 1 loss : 14.275728225708008 step : 2 loss : 14.061412811279297 step : 3 loss : 13.863075256347656 step : 4 loss : 13.673211097717285 step : 5 loss : 13.488382339477539 step : 6 loss : 13.308210372924805 step : 7 loss : 13.133746147155762 step : 8 loss : 12.96371841430664 step : 9 loss : 12.794361114501953 step : 10 loss : 12.626708030700684 step : 11 loss : 12.460148811340332 step : 12 loss : 12.284730911254883 step : 13 loss : 12.117682456970215 step : 14 loss : 11.960124969482422 step : 15 loss : 11.808381080627441 step : 16 loss : 11.660979270935059 step : 17 loss : 11.519412994384766 step : 18 loss : 11.381877899169922 step : 19 loss : 11.246015548706055 step : 20 loss : 11.11082649230957 step : 21 loss : 10.976456642150879 step : 22 loss : 10.843568801879883 step : 23 loss : 10.71261215209961 step : 24 loss : 10.583839416503906 step : 25 loss : 10.457551956176758 step : 26 loss : 10.33198356628418 step : 27 loss : 10.206745147705078 step : 28 loss : 10.084156036376953 step : 29 loss : 9.964303016662598 step : 30 loss : 9.846068382263184 step : 31 loss : 9.728982925415039 step : 32 loss : 9.612885475158691 step : 33 loss : 9.497721672058105 step : 34 loss : 9.383604049682617 step : 35 loss : 9.313443183898926 step : 36 loss : 9.199136734008789 step : 37 loss : 9.054981231689453 step : 38 loss : 8.952714920043945 step : 39 loss : 8.853232383728027 step : 40 loss : 8.755013465881348 step : 41 loss : 8.657848358154297 step : 42 loss : 8.562182426452637 step : 43 loss : 8.468605041503906 step : 44 loss : 8.377588272094727 step : 45 loss : 8.289313316345215 step : 46 loss : 8.203658103942871 step : 47 loss : 8.120292663574219 step : 48 loss : 8.038833618164062 step : 49 loss : 7.958977699279785 Training model 3 step : 0 loss : 13.974383354187012 step : 1 loss : 13.73353385925293 step : 2 loss : 13.51854133605957 step : 3 loss : 13.314557075500488 step : 4 loss : 13.116266250610352 step : 5 loss : 12.909747123718262 step : 6 loss : 12.728311538696289 step : 7 loss : 12.560131072998047 step : 8 loss : 12.367165565490723 step : 9 loss : 12.209863662719727 step : 10 loss : 12.055900573730469 step : 11 loss : 11.898309707641602 step : 12 loss : 11.734042167663574 step : 13 loss : 11.563725471496582 step : 14 loss : 11.391763687133789 step : 15 loss : 11.219243049621582 step : 16 loss : 11.04773998260498 step : 17 loss : 10.875751495361328 step : 18 loss : 10.601932525634766 step : 19 loss : 10.427793502807617 step : 20 loss : 10.25275707244873 step : 21 loss : 10.075128555297852 step : 22 loss : 9.893473625183105 step : 23 loss : 9.708620071411133 step : 24 loss : 9.522173881530762 step : 25 loss : 9.336169242858887 step : 26 loss : 9.152546882629395 step : 27 loss : 8.9728422164917 step : 28 loss : 8.798229217529297 step : 29 loss : 8.629088401794434 step : 30 loss : 8.465328216552734 step : 31 loss : 8.306522369384766 step : 32 loss : 8.152256965637207 step : 33 loss : 8.00222110748291 step : 34 loss : 7.856224060058594 step : 35 loss : 7.714158535003662 step : 36 loss : 7.575982570648193 step : 37 loss : 7.441679000854492 step : 38 loss : 7.311223030090332 step : 39 loss : 7.184561729431152 step : 40 loss : 7.061589241027832 step : 41 loss : 6.9421892166137695 step : 42 loss : 6.826196670532227 step : 43 loss : 6.713418483734131 step : 44 loss : 6.6036176681518555 step : 45 loss : 6.496509552001953 step : 46 loss : 6.391753673553467 step : 47 loss : 6.289035797119141 step : 48 loss : 6.189352512359619 step : 49 loss : 6.090808868408203 Training model 4 step : 0 loss : 14.112824440002441 step : 1 loss : 13.873544692993164 step : 2 loss : 13.65280818939209 step : 3 loss : 13.441068649291992 step : 4 loss : 13.231843948364258 step : 5 loss : 13.021574020385742 step : 6 loss : 12.811164855957031 step : 7 loss : 12.59911060333252 step : 8 loss : 12.389607429504395 step : 9 loss : 12.186083793640137 step : 10 loss : 11.95494556427002 step : 11 loss : 11.759047508239746 step : 12 loss : 11.566143035888672 step : 13 loss : 11.320815086364746 step : 14 loss : 11.122957229614258 step : 15 loss : 10.927894592285156 step : 16 loss : 10.739933967590332 step : 17 loss : 10.557984352111816 step : 18 loss : 10.381101608276367 step : 19 loss : 10.20901107788086 step : 20 loss : 10.041470527648926 step : 21 loss : 9.878012657165527 step : 22 loss : 9.717863082885742 step : 23 loss : 9.560154914855957 step : 24 loss : 9.403970718383789 step : 25 loss : 9.248527526855469 step : 26 loss : 9.093567848205566 step : 27 loss : 8.939751625061035 step : 28 loss : 8.788496017456055 step : 29 loss : 8.641401290893555 step : 30 loss : 8.49963092803955 step : 31 loss : 8.365594863891602 step : 32 loss : 8.233518600463867 step : 33 loss : 8.108604431152344 step : 34 loss : 7.988570213317871 step : 35 loss : 7.871016979217529 step : 36 loss : 7.757694721221924 step : 37 loss : 7.647851943969727 step : 38 loss : 7.541498184204102 step : 39 loss : 7.4381103515625 step : 40 loss : 7.337221622467041 step : 41 loss : 7.238999843597412 step : 42 loss : 7.143918037414551 step : 43 loss : 7.052158355712891 step : 44 loss : 6.962757110595703 step : 45 loss : 6.874863624572754 step : 46 loss : 6.788544654846191 step : 47 loss : 6.70429801940918 step : 48 loss : 6.622411727905273 step : 49 loss : 6.5428948402404785 Training model 5 step : 0 loss : 14.307440757751465 step : 1 loss : 14.0261812210083 step : 2 loss : 13.765351295471191 step : 3 loss : 13.528738975524902 step : 4 loss : 13.315656661987305 step : 5 loss : 13.110860824584961 step : 6 loss : 12.937095642089844 step : 7 loss : 12.771193504333496 step : 8 loss : 12.608572006225586 step : 9 loss : 12.446443557739258 step : 10 loss : 12.284333229064941 step : 11 loss : 12.121978759765625 step : 12 loss : 11.958609580993652 step : 13 loss : 11.79438591003418 step : 14 loss : 11.6304292678833 step : 15 loss : 11.466774940490723 step : 16 loss : 11.304401397705078 step : 17 loss : 11.146822929382324 step : 18 loss : 10.993706703186035 step : 19 loss : 10.843819618225098 step : 20 loss : 10.692940711975098 step : 21 loss : 10.549134254455566 step : 22 loss : 10.407319068908691 step : 23 loss : 10.267396926879883 step : 24 loss : 10.12883186340332 step : 25 loss : 9.99206829071045 step : 26 loss : 9.8572998046875 step : 27 loss : 9.724531173706055 step : 28 loss : 9.593561172485352 step : 29 loss : 9.464423179626465 step : 30 loss : 9.337360382080078 step : 31 loss : 9.212629318237305 step : 32 loss : 9.090309143066406 step : 33 loss : 8.970354080200195 step : 34 loss : 8.852643013000488 step : 35 loss : 8.737071990966797 step : 36 loss : 8.62354850769043 step : 37 loss : 8.512018203735352 step : 38 loss : 8.40242862701416 step : 39 loss : 8.294722557067871 step : 40 loss : 8.188859939575195 step : 41 loss : 8.084774017333984 step : 42 loss : 7.982387542724609 step : 43 loss : 7.881572246551514 step : 44 loss : 7.7821760177612305 step : 45 loss : 7.683959484100342 step : 46 loss : 7.586613655090332 step : 47 loss : 7.489711761474609 step : 48 loss : 7.392746448516846 step : 49 loss : 7.295129776000977 LSTM RMSE : 2.6384 OLS RMSE : 2.4427 (baseline)
| Observed | LSTM predicted | Percent Error | |
|---|---|---|---|
| date | |||
| 2023-06-30 | -0.066758 | 0.697591 | -11.449530 |
| 2023-09-30 | 1.988200 | 1.455597 | -0.267882 |
| 2023-12-31 | -0.778219 | 0.840349 | -2.079836 |
| 2024-03-31 | -0.054704 | 1.516490 | -28.721686 |
| 2024-06-30 | -0.997453 | 0.535358 | -1.536726 |
| 2024-09-30 | -1.055556 | 0.714315 | -1.676719 |
| 2024-12-31 | 1.691583 | 1.502319 | -0.111886 |
| 2025-03-31 | 1.086629 | 1.165029 | 0.072150 |
| 2025-06-30 | 0.255034 | 0.598481 | 1.346674 |
| 2025-09-30 | 0.350465 | 0.695358 | 0.984098 |
| 2025-12-31 | -7.272064 | 0.781064 | -1.107406 |