Results¶
Estimation results are stored in 3 files: the point estimates, the performance (RMSE), and out-of-sample estimates
import numpy as np
import pandas as pd
import os
date_str = "20260511"
PATH_OUTPUT = f"{os.getcwd()}/../output/{date_str}/"
forecast_file = f"{date_str} 3_nwcst Tab fcst.xlsx" # model forecasts / nowcasts
performance_file = f"{date_str} 3_nwcst Tab perf.xlsx" # model performance metrics
outofsample_file = f"{date_str} 3_nwcst Tab oos.xlsx" # full OOS prediction panel
results_table = pd.read_excel(f"{PATH_OUTPUT}/{forecast_file}", index_col='date')
results_table.index = pd.to_datetime(results_table.index, format="%Y-%m-%d")
performance_table = pd.read_excel(f"{PATH_OUTPUT}/{performance_file}", index_col='Vintage')
#outofsample_table = pd.read_excel(f"{PATH_OUTPUT}/{outofsample_file}", index_col='date')
Inspect the performance table. This shows each model's prediction error (RMSE) broken down by forecasting horizon ("Vintage"): negative numbers mean the data was already published when the model ran; positive means it was predicting quarters that had not been released yet.
performance_table
| RMSE | estimator | spec | |
|---|---|---|---|
| Vintage | |||
| -2 | 2.298953 | OLS | lars46 |
| -1 | 2.291649 | OLS | lars46 |
| 0 | 2.624861 | OLS | lars46 |
| 1 | 12.721925 | OLS | lars46 |
| 2 | 13.272481 | OLS | lars46 |
| ... | ... | ... | ... |
| -2 | 2.600459 | LSTM | lars49 |
| -1 | 2.520544 | LSTM | lars49 |
| 0 | 2.603486 | LSTM | lars49 |
| 1 | 2.549008 | LSTM | lars49 |
| 2 | 2.649067 | LSTM | lars49 |
80 rows × 3 columns
Check which model specifications are present in the performance data.
performance_table['spec'].unique()
array(['lars46', 'lars49'], dtype=object)
Inspect the results table. This contains the actual nowcast values produced by each model, for each quarter and each forecasting horizon.
results_table
| actuals | two_back | one_back | zero_back | one_ahead | two_ahead | estimator | spec | |
|---|---|---|---|---|---|---|---|---|
| date | ||||||||
| 2023-06-01 | -0.066758 | -0.419722 | -0.332563 | -1.330659 | -35.239000 | -36.206200 | OLS | lars46 |
| 2023-09-01 | 1.988200 | -0.308410 | -0.357085 | -0.358990 | -7.257542 | -1.693524 | OLS | lars46 |
| 2023-12-01 | -0.778219 | -0.228725 | 0.150708 | 0.873922 | 2.111830 | 1.235066 | OLS | lars46 |
| 2024-03-01 | -0.054704 | -0.075191 | 0.255990 | 0.315120 | -1.867632 | 0.578832 | OLS | lars46 |
| 2024-06-01 | -0.997453 | -0.277602 | -0.212376 | 0.435814 | -4.414146 | -5.344480 | OLS | lars46 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2024-12-01 | 1.691583 | -0.187005 | 0.239369 | 0.297889 | 0.241903 | 0.444719 | LSTM | lars49 |
| 2025-03-01 | 1.086629 | -0.026154 | -0.705539 | -1.088655 | -0.908419 | -1.005068 | LSTM | lars49 |
| 2025-06-01 | 0.255034 | 0.926614 | 0.676982 | 0.648158 | 0.395492 | 0.603667 | LSTM | lars49 |
| 2025-09-01 | 0.350465 | 0.315954 | 0.193476 | -0.264647 | -0.701200 | -0.409720 | LSTM | lars49 |
| 2025-12-01 | -7.272064 | 0.345186 | 0.046084 | 0.148406 | -0.016349 | 0.528473 | LSTM | lars49 |
176 rows × 8 columns
Check which model specifications are present in the results data.
results_table['spec'].unique()
array(['lars46', 'lars49'], dtype=object)
Presenting Results¶
Set up plotting libraries and color palette. We import the tools needed to draw charts and define the IDB color scheme used throughout.
import matplotlib.ticker as ticker
import matplotlib.pyplot as plt
import seaborn as sns
# Define the color scheme
idb_blue = "#005BAC"
idb_light_blue = "#7FB3E6"
Average RMSE by publication lag¶
all_means = performance_table.groupby(performance_table.index)['RMSE'].mean()
fig, axes = plt.subplots(1, 1, figsize=(8, 5)) # adjust figsize as you want
# If axes is a single AxesSubplot (not an array), make it a list for consistency
axes = [axes]
# Plot
sns.lineplot(data=performance_table, x='Vintage', y='RMSE', ax=axes[0], color=idb_blue)
axes[0].xaxis.set_major_locator(ticker.MaxNLocator(integer=True))
axes[0].set_title("")
axes[0].tick_params(axis='x', rotation=45)
plt.tight_layout()
# Save figure:
plt.savefig(f'{PATH_OUTPUT}/Figure Vintage.png')
plt.show()
Boxplot by publication lag¶
df = performance_table.reset_index()
fig, ax = plt.subplots(figsize=(8, 5))
# If you want a specific order, define it here; otherwise seaborn will infer it.
# order = sorted(df['Publication lag'].unique())
order = None
sns.boxplot(
data=df,
x='Vintage',
y='RMSE',
order=order,
ax=ax,
boxprops=dict(color=idb_light_blue),
whiskerprops=dict(color=idb_blue),
capprops=dict(color=idb_blue),
medianprops=dict(color='black'),
flierprops=dict(markerfacecolor=idb_light_blue, markeredgecolor=idb_blue,
marker='o', markersize=6)
)
ax.set_title("")
ax.tick_params(axis='x', rotation=45)
plt.tight_layout()
plt.show()
c:\Users\guerr\anaconda3\envs\nwcst\Lib\site-packages\seaborn\categorical.py:700: UserWarning: Setting the 'color' property will override the edgecolor or facecolor properties. artists = ax.bxp(**boxplot_kws)
Same boxplot with mean markers added. The dot in each box shows the average RMSE for that publication lag, making it easier to compare central tendency across vintages.
df = performance_table.reset_index()
fig, ax = plt.subplots(figsize=(8, 5))
# If you want a specific order, define it here; otherwise seaborn will infer it.
# order = sorted(df['Publication lag'].unique())
order = None
sns.boxplot(
data=df,
x='Vintage',
y='RMSE',
order=order,
ax=ax,
boxprops=dict(color=idb_light_blue),
whiskerprops=dict(color=idb_blue),
capprops=dict(color=idb_blue),
medianprops=dict(color='black'),
flierprops=dict(markerfacecolor=idb_light_blue, markeredgecolor=idb_blue,
marker='o', markersize=6)
)
# Get the actual category order used on the axis (this is key)
cats = [t.get_text() for t in ax.get_xticklabels()]
# Means computed in the same category space as the plot
means = df.groupby('Vintage', dropna=False)['RMSE'].mean()
# Map tick labels -> x positions
xpos = np.arange(len(cats))
# Overlay means (match tick labels carefully)
ymeans = []
for c in cats:
# Try direct label
if c in means.index:
ymeans.append(means.loc[c])
continue
# Try numeric label -> numeric index match (common when lag is int/float)
try:
cnum = float(c)
hit = None
for idx in means.index:
try:
if float(idx) == cnum:
hit = means.loc[idx]
break
except Exception:
pass
ymeans.append(hit)
except Exception:
ymeans.append(None)
# Plot only the ones we found
mask = [v is not None and np.isfinite(v) for v in ymeans]
ax.scatter(
xpos[mask],
np.array(ymeans, dtype=float)[mask],
color='black',
marker='o',
s=50,
zorder=10,
edgecolor='k',
linewidth=0.4,
label='Mean'
)
# Clean legend (only "Mean" once)
handles, labels = ax.get_legend_handles_labels()
by_label = dict(zip(labels, handles))
ax.legend(by_label.values(), by_label.keys())
ax.set_title("")
ax.tick_params(axis='x', rotation=45)
plt.tight_layout()
plt.show()
c:\Users\guerr\anaconda3\envs\nwcst\Lib\site-packages\seaborn\categorical.py:700: UserWarning: Setting the 'color' property will override the edgecolor or facecolor properties. artists = ax.bxp(**boxplot_kws)
Histogram: distribution of errors¶
human_mean = performance_table[performance_table['spec']=='human1']['RMSE'].mean()
plt.figure(figsize=(8, 6))
# Plot with Seaborn, no edge lines
sns.histplot(performance_table['RMSE'], bins=30, kde=False, color=idb_blue, edgecolor=None)
# Median line
plt.axvline(human_mean, color='black', linestyle=':', linewidth=2, label=f'Human mean = {human_mean:.2f}')
plt.title("")
plt.xlabel("Root Mean Squared Error (RMSE)")
plt.ylabel("Frequency")
plt.legend()
plt.tight_layout()
plt.show()
Filter out extreme outliers using the vintage mean
spec_rmse_mean = pd.DataFrame(performance_table.groupby('Vintage')['RMSE'].median())
spec_rmse_mean = spec_rmse_mean.rename(columns={'RMSE': 'threshold'})
spec_rmse_mean
| threshold | |
|---|---|
| Vintage | |
| -2 | 2.511964 |
| -1 | 2.517836 |
| 0 | 2.613635 |
| 1 | 3.779224 |
| 2 | 7.888829 |
Apply the outlier filter and redraw the histogram. Models whose RMSE exceeds the vintage-specific median are dropped. The histogram now shows the cleaned error distribution alongside the human-forecaster benchmark.
performance_table = performance_table.merge(spec_rmse_mean, on='Vintage', how='left')
# ++++++++++++++++++++++++++++++++++++++++++++++++++++
# ++++++++++++++++++++++++++++++++++++++++++++++++++++
filtered = performance_table[performance_table['RMSE'] < performance_table['threshold']]
# ++++++++++++++++++++++++++++++++++++++++++++++++++++
# ++++++++++++++++++++++++++++++++++++++++++++++++++++
human_mean = performance_table[performance_table['spec']=='human1']['RMSE'].mean()
plt.figure(figsize=(8, 6))
# Plot with Seaborn, no edge lines
sns.histplot(filtered['RMSE'], bins=30, kde=False, color=idb_blue, edgecolor=None)
# Median line
plt.axvline(human_mean, color='black', linestyle=':', linewidth=2, label=f'Human mean = {human_mean:.2f}')
plt.title("")
plt.xlabel("Root Mean Squared Error (RMSE)")
plt.ylabel("Frequency")
plt.legend()
plt.tight_layout()
plt.show()
Histogram conditional on publication lag¶
human_means = (
performance_table[performance_table['spec'] == 'human1']
.groupby('Vintage')
.mean(numeric_only=True)
)
# Mapping of vintages
lags = {
-2: 'two_back',
-1: 'one_back',
0: 'zero_back',
1: 'one_ahead',
2: 'two_ahead'
}
fig, axes = plt.subplots(nrows=3, ncols=2, figsize=(14, 12), sharex=False, sharey=False)
axes = axes.flatten()
# ------------------------------------------------
# Loop over each vintage and plot histogram
for i, (V, N) in enumerate(lags.items()):
ax = axes[i]
# Filter data for this vintage
subset = filtered[filtered.index == V]
# Histogram
sns.histplot(subset['RMSE'], bins=30, kde=False, color=idb_blue, edgecolor=None, ax=ax)
# Median line
rmse_median = subset['RMSE'].median()
ax.axvline(rmse_median, color='black', linestyle='dashed', linewidth=2, label=f'Median = {rmse_median:.2f}')
# Human benchmark
human_mean = human_means[human_means.index==V]['RMSE'].mean()
ax.axvline(human_mean, color='black', linestyle=':', linewidth=2, label=f'Human mean = {human_mean:.2f}')
# Titles
ax.set_title(f'Information set p={V}', fontsize=12)
ax.set_ylabel('Frequency')
ax.legend(fontsize=8, loc='upper right')
# ------------------------------------------------
# Last panel: pooled distribution
ax = axes[-1]
sns.histplot(filtered['RMSE'], bins=30, kde=False, color=idb_blue, edgecolor=None, ax=ax)
rmse_median = filtered['RMSE'].median()
ax.axvline(rmse_median, color='black', linestyle='dashed', linewidth=2, label=f'Median = {rmse_median:.2f}')
human_mean = performance_table[performance_table['spec'] == 'human1']['RMSE'].mean()
ax.axvline(human_mean, color='black', linestyle=':', linewidth=2, label=f'Human mean = {human_mean:.2f}')
ax.set_title('All Vintages', fontsize=12)
ax.set_ylabel('Frequency')
ax.legend(fontsize=8, loc='upper left')
# ------------------------------------------------
# X-labels on bottom row
for ax in axes[-3:]:
ax.set_xlabel("Root Mean Squared Error (RMSE)")
plt.tight_layout()
plt.show()
Boxplot (outliers management)¶
# First we define reference points
human_p = performance_table[performance_table['spec']=='human1']
human_p = human_p.groupby(human_p.index)['RMSE'].mean()
all_means = filtered.groupby(filtered.index)['RMSE'].mean()
fig, ax = plt.subplots(figsize=(8, 5))
sns.boxplot(
data=filtered.reset_index(),
x='Vintage',
y='RMSE',
ax=ax,
boxprops=dict(color=idb_light_blue),
whiskerprops=dict(color=idb_blue),
capprops=dict(color=idb_blue),
medianprops=dict(color='black'),
flierprops=dict(markerfacecolor=idb_light_blue, markeredgecolor=idb_blue, marker='o', markersize=6)
)
# Ensure the figure is rendered so tick labels are available
fig.canvas.draw() # <- important
xticks = ax.get_xticks()
xtlabels = [t.get_text() for t in ax.get_xticklabels()]
# compute means (keep original dtype)
all_means = filtered.reset_index().groupby('Vintage')['RMSE'].mean()
def find_mean_for_label(lbl):
# direct match
if lbl in all_means.index:
return all_means.loc[lbl]
# try numeric match
try:
lbl_num = float(lbl)
for idx in all_means.index:
try:
if float(idx) == lbl_num:
return all_means.loc[idx]
except Exception:
continue
except Exception:
pass
# fallback: match by string representation
for idx in all_means.index:
if str(idx) == lbl:
return all_means.loc[idx]
return None
# plot mean markers aligned to the xtick positions
for i, lbl in enumerate(xtlabels):
m = find_mean_for_label(lbl)
if m is not None:
ax.scatter(xticks[i], m, color='black', marker='o', s=50, zorder=10,
label='Mean' if i == 0 else "", edgecolor='k', linewidth=0.4)
# overlay human points: match similarly
for i, vintage in enumerate(human_p.index):
# try to find which xtick corresponds
for j, lbl in enumerate(xtlabels):
try:
if str(vintage) == lbl or float(vintage) == float(lbl):
ax.scatter(xticks[j], human_p[vintage], color='black', marker='v', s=50,
label='Human' if i == 0 else "")
break
except Exception:
if str(vintage) == lbl:
ax.scatter(xticks[j], human_p[vintage], color='black', marker='v', s=50,
label='Human' if i == 0 else "")
break
# clean legend
handles, labels = ax.get_legend_handles_labels()
unique = dict(zip(labels, handles))
ax.legend(unique.values(), unique.keys())
ax.set_title("")
ax.tick_params(axis='x', rotation=45)
plt.tight_layout()
plt.show()
c:\Users\guerr\anaconda3\envs\nwcst\Lib\site-packages\seaborn\categorical.py:700: UserWarning: Setting the 'color' property will override the edgecolor or facecolor properties. artists = ax.bxp(**boxplot_kws)
Nowcast outputs:¶
Results/RMSE merge¶
Filter out results from under-performing models
# -----------------------------------------------
# Reshape results to long format
df = results_table.reset_index().melt(
id_vars=["date", 'spec', 'estimator'],
value_vars=["two_back", "one_back", "zero_back", "one_ahead", "two_ahead"],
var_name="horizon",
value_name="value"
)
# Make Vintage column based on horizon -2, -1, 0, 1, 2
df['Vintage'] = df['horizon'].map({
'two_back': -2,
'one_back': -1,
'zero_back': 0,
'one_ahead': 1,
'two_ahead': 2
})
df
| date | spec | estimator | horizon | value | Vintage | |
|---|---|---|---|---|---|---|
| 0 | 2023-06-01 | lars46 | OLS | two_back | -0.419722 | -2 |
| 1 | 2023-09-01 | lars46 | OLS | two_back | -0.308410 | -2 |
| 2 | 2023-12-01 | lars46 | OLS | two_back | -0.228725 | -2 |
| 3 | 2024-03-01 | lars46 | OLS | two_back | -0.075191 | -2 |
| 4 | 2024-06-01 | lars46 | OLS | two_back | -0.277602 | -2 |
| ... | ... | ... | ... | ... | ... | ... |
| 875 | 2024-12-01 | lars49 | LSTM | two_ahead | 0.444719 | 2 |
| 876 | 2025-03-01 | lars49 | LSTM | two_ahead | -1.005068 | 2 |
| 877 | 2025-06-01 | lars49 | LSTM | two_ahead | 0.603667 | 2 |
| 878 | 2025-09-01 | lars49 | LSTM | two_ahead | -0.409720 | 2 |
| 879 | 2025-12-01 | lars49 | LSTM | two_ahead | 0.528473 | 2 |
880 rows × 6 columns
Attach each model's RMSE to its nowcast values. We join the performance table onto the long-format results so that every prediction row carries the corresponding model error — needed for the quality filter below.
p = performance_table.reset_index()
results_w = (
df # safe even if index is unnamed
.rename(columns=str.strip)
.merge(
p.rename(columns=str.strip)[['Vintage','spec','estimator','RMSE']],
on=['Vintage','spec','estimator'],
how='left'
)
)
results_w
| date | spec | estimator | horizon | value | Vintage | RMSE | |
|---|---|---|---|---|---|---|---|
| 0 | 2023-06-01 | lars46 | OLS | two_back | -0.419722 | -2 | 2.298953 |
| 1 | 2023-09-01 | lars46 | OLS | two_back | -0.308410 | -2 | 2.298953 |
| 2 | 2023-12-01 | lars46 | OLS | two_back | -0.228725 | -2 | 2.298953 |
| 3 | 2024-03-01 | lars46 | OLS | two_back | -0.075191 | -2 | 2.298953 |
| 4 | 2024-06-01 | lars46 | OLS | two_back | -0.277602 | -2 | 2.298953 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 875 | 2024-12-01 | lars49 | LSTM | two_ahead | 0.444719 | 2 | 2.649067 |
| 876 | 2025-03-01 | lars49 | LSTM | two_ahead | -1.005068 | 2 | 2.649067 |
| 877 | 2025-06-01 | lars49 | LSTM | two_ahead | 0.603667 | 2 | 2.649067 |
| 878 | 2025-09-01 | lars49 | LSTM | two_ahead | -0.409720 | 2 | 2.649067 |
| 879 | 2025-12-01 | lars49 | LSTM | two_ahead | 0.528473 | 2 | 2.649067 |
880 rows × 7 columns
Keep only the better-performing models. For each forecasting horizon we discard models whose RMSE is above the horizon average, retaining only those that have predicted well historically.
mean_rmse = results_w.groupby('Vintage')['RMSE'].mean()
# Keep rows with RMSE <= Vintage-specific mean
filtered = results_w[
results_w['RMSE'] <= results_w['Vintage'].map(mean_rmse)
]
filtered
| date | spec | estimator | horizon | value | Vintage | RMSE | |
|---|---|---|---|---|---|---|---|
| 0 | 2023-06-01 | lars46 | OLS | two_back | -0.419722 | -2 | 2.298953 |
| 1 | 2023-09-01 | lars46 | OLS | two_back | -0.308410 | -2 | 2.298953 |
| 2 | 2023-12-01 | lars46 | OLS | two_back | -0.228725 | -2 | 2.298953 |
| 3 | 2024-03-01 | lars46 | OLS | two_back | -0.075191 | -2 | 2.298953 |
| 4 | 2024-06-01 | lars46 | OLS | two_back | -0.277602 | -2 | 2.298953 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 875 | 2024-12-01 | lars49 | LSTM | two_ahead | 0.444719 | 2 | 2.649067 |
| 876 | 2025-03-01 | lars49 | LSTM | two_ahead | -1.005068 | 2 | 2.649067 |
| 877 | 2025-06-01 | lars49 | LSTM | two_ahead | 0.603667 | 2 | 2.649067 |
| 878 | 2025-09-01 | lars49 | LSTM | two_ahead | -0.409720 | 2 | 2.649067 |
| 879 | 2025-12-01 | lars49 | LSTM | two_ahead | 0.528473 | 2 | 2.649067 |
583 rows × 7 columns
Ridge plot¶
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
df = filtered[['date', 'horizon', 'Vintage', 'value']]
observed = pd.DataFrame(results_table['actuals'].groupby(results_table.index).mean())
actuals_dict = observed['actuals'].to_dict()
df
| date | horizon | Vintage | value | |
|---|---|---|---|---|
| 0 | 2023-06-01 | two_back | -2 | -0.419722 |
| 1 | 2023-09-01 | two_back | -2 | -0.308410 |
| 2 | 2023-12-01 | two_back | -2 | -0.228725 |
| 3 | 2024-03-01 | two_back | -2 | -0.075191 |
| 4 | 2024-06-01 | two_back | -2 | -0.277602 |
| ... | ... | ... | ... | ... |
| 875 | 2024-12-01 | two_ahead | 2 | 0.444719 |
| 876 | 2025-03-01 | two_ahead | 2 | -1.005068 |
| 877 | 2025-06-01 | two_ahead | 2 | 0.603667 |
| 878 | 2025-09-01 | two_ahead | 2 | -0.409720 |
| 879 | 2025-12-01 | two_ahead | 2 | 0.528473 |
583 rows × 4 columns
Draw the ridge plot. Each row is one quarter; the colored curves show how the nowcast values are spread across models for each horizon. This reveals how much uncertainty exists at different information sets.
# Mock data to make the code runnable
dates = df["date"].unique()
horizons = ["two_back", "one_back", "zero_back", "one_ahead", "two_ahead"]
data = []
for date in dates:
for horizon in horizons:
# Create different distribution shapes to make the plot more interesting
values = df[(df['date']==date) & (df['horizon']==horizon)].value
data.extend([(date, horizon, v) for v in values])
df = pd.DataFrame(data, columns=["date", "horizon", "value"])
horizon_labels = {
"two_back": "p=-2",
"one_back": "p=-1",
"zero_back": "p=0",
"one_ahead": "p=1",
"two_ahead": "p=2"
}
df["horizon"] = df["horizon"].map(horizon_labels)
sns.set_theme(style="white", rc={"axes.facecolor": (0, 0, 0, 0)})
# Generate a range of colors for the horizons
horizons_unique = df["horizon"].unique()
n_horizons = len(horizons_unique)
pal = sns.cubehelix_palette(n_horizons, rot=-.25, light=.7)
# The fix is to add sharey=False
g = sns.FacetGrid(
df,
row="date",
hue="horizon",
aspect=5,
height=2,
palette=pal,
sharey=True # This is the key change
)
# KDE distributions
g.map(
sns.kdeplot,
"value",
bw_adjust=0.7,
clip_on=False,
fill=True,
alpha=0.6,
linewidth=1.2,
common_norm=True
)
# Contour lines
g.map(
sns.kdeplot,
"value",
bw_adjust=0.7,
clip_on=False,
color="w",
lw=0.5
)
# Adjust spacing between ridges
g.fig.subplots_adjust(hspace=-0.2)
# Y-axis labels as dates, remove density scale
for ax, date in zip(g.axes.flat, df["date"].unique()):
ax.set_ylabel(date.strftime("%Y-%m-%d"), rotation=0, labelpad=50, ha="right", va="center", fontsize=18)
ax.set_yticks([])
ax.axhline(y=0, color="gray", lw=0.6, ls="--", alpha=0.7)
# Add observed (example without results_table)
# Loop through the axes and sorted dates to plot the correct 'actual' value
sorted_dates = sorted(df["date"].unique())
for ax, date in zip(g.axes.flat, sorted_dates):
actual_value = actuals_dict.get(date)
if actual_value is not None:
# Create a line with y-coordinates from 0.1 to 0.9 of the axes height,
# but with the x-coordinate in data units.
line = plt.Line2D(
[actual_value, actual_value], # x-coordinates in data units
[0, 0.9], # y-coordinates in axes units
transform=ax.transData, # This is the correct transform object
color='black', linestyle='-', lw=1, alpha=0.8
)
# Add the line to the current axes.
ax.add_line(line)
# Cleanup axes
g.set(xlabel="GDP growth QoQ%")
g.set_titles("")
g.despine(bottom=True, left=True)
# Custom white legend
palette_dict = dict(zip(horizons_unique, pal))
handles = [mpatches.Patch(color=palette_dict[h], label=h) for h in horizons_unique]
plt.legend(
handles=handles,
title="Information",
title_fontsize=18,
fontsize=18,
bbox_to_anchor=(1.2, 4),
loc="upper center",
frameon=True,
facecolor="white",
edgecolor="gray",
framealpha=1
)
# Save and show
plt.show()
Weighted Nowcast¶
We will weight the nowcast by the inverse of the RMSE. Effectively, we will compute:
$$ \hat{v} = \frac{\sum_{i=1}^{N} w_i \, v_i}{\sum_{i=1}^{N} w_i} = \frac{\sum_{i=1}^{N} \dfrac{v_i}{\mathrm{RMSE}_i}} {\sum_{i=1}^{N} \dfrac{1}{\mathrm{RMSE}_i}} $$
We can also estimate a weighted confidence interval using the same weights.
$$ \mathrm{SE}(\hat{v}) = \frac{1}{\sqrt{\sum_{i=1}^{N} w_i}} = \left( \sum_{i=1}^{N} \frac{1}{\mathrm{RMSE}_i} \right)^{-1/2} $$
$$ \hat{v} \pm 1.96 \cdot \mathrm{SE}(\hat{v}) $$
filtered = filtered.copy()
filtered['date'] = pd.to_datetime(filtered['date'])
filtered['weight'] = 1 / filtered['RMSE']
filtered
| date | spec | estimator | horizon | value | Vintage | RMSE | weight | |
|---|---|---|---|---|---|---|---|---|
| 0 | 2023-06-01 | lars46 | OLS | two_back | -0.419722 | -2 | 2.298953 | 0.434981 |
| 1 | 2023-09-01 | lars46 | OLS | two_back | -0.308410 | -2 | 2.298953 | 0.434981 |
| 2 | 2023-12-01 | lars46 | OLS | two_back | -0.228725 | -2 | 2.298953 | 0.434981 |
| 3 | 2024-03-01 | lars46 | OLS | two_back | -0.075191 | -2 | 2.298953 | 0.434981 |
| 4 | 2024-06-01 | lars46 | OLS | two_back | -0.277602 | -2 | 2.298953 | 0.434981 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 875 | 2024-12-01 | lars49 | LSTM | two_ahead | 0.444719 | 2 | 2.649067 | 0.377491 |
| 876 | 2025-03-01 | lars49 | LSTM | two_ahead | -1.005068 | 2 | 2.649067 | 0.377491 |
| 877 | 2025-06-01 | lars49 | LSTM | two_ahead | 0.603667 | 2 | 2.649067 | 0.377491 |
| 878 | 2025-09-01 | lars49 | LSTM | two_ahead | -0.409720 | 2 | 2.649067 | 0.377491 |
| 879 | 2025-12-01 | lars49 | LSTM | two_ahead | 0.528473 | 2 | 2.649067 | 0.377491 |
583 rows × 8 columns
Pull out the observed (actual) GDP values. These are the true outcomes the models were trying to predict — we will compare nowcasts against them in the next steps.
df = results_table[['actuals']]
df = df.groupby('date').mean()
df.index = pd.to_datetime(df.index)
df
| actuals | |
|---|---|
| date | |
| 2023-06-01 | -0.066758 |
| 2023-09-01 | 1.988200 |
| 2023-12-01 | -0.778219 |
| 2024-03-01 | -0.054704 |
| 2024-06-01 | -0.997453 |
| 2024-09-01 | -1.055556 |
| 2024-12-01 | 1.691583 |
| 2025-03-01 | 1.086629 |
| 2025-06-01 | 0.255034 |
| 2025-09-01 | 0.350465 |
| 2025-12-01 | -7.272064 |
Compute the RMSE-weighted average nowcast for the earliest horizon (two quarters before release). Models with lower historical errors receive more weight, so the combined forecast relies more on models that have been accurate in the past.
# ---------------------------------------------------
V = -2
N = "two_back"
d = filtered.loc[filtered['Vintage'] == V]
weighted = (
(d['value'] * d['weight']).groupby(d['date']).sum()
/ d.groupby(d['date'])['weight'].sum()
).reset_index(name=N).set_index('date')
df = df.merge(weighted, left_index=True, right_index=True, how='left')
df
| actuals | two_back | |
|---|---|---|
| date | ||
| 2023-06-01 | -0.066758 | 0.121110 |
| 2023-09-01 | 1.988200 | 0.129354 |
| 2023-12-01 | -0.778219 | 0.115386 |
| 2024-03-01 | -0.054704 | 0.200174 |
| 2024-06-01 | -0.997453 | 0.022680 |
| 2024-09-01 | -1.055556 | -0.008797 |
| 2024-12-01 | 1.691583 | 0.080847 |
| 2025-03-01 | 1.086629 | -0.123339 |
| 2025-06-01 | 0.255034 | -0.170563 |
| 2025-09-01 | 0.350465 | 0.081047 |
| 2025-12-01 | -7.272064 | -0.154390 |
Repeat the weighted average for the remaining four forecasting horizons, adding each as a new column to the table.
# ---------------------------------------------------
V = -1
N = "one_back"
d = filtered.loc[filtered['Vintage'] == V]
weighted = (
(d['value'] * d['weight']).groupby(d['date']).sum()
/ d.groupby(d['date'])['weight'].sum()
).reset_index(name=N).set_index('date')
df = df.merge(weighted, left_index=True, right_index=True, how='left')
# ---------------------------------------------------
V = 0
N = "zero_back"
d = filtered.loc[filtered['Vintage'] == V]
weighted = (
(d['value'] * d['weight']).groupby(d['date']).sum()
/ d.groupby(d['date'])['weight'].sum()
).reset_index(name=N).set_index('date')
df = df.merge(weighted, left_index=True, right_index=True, how='left')
# ---------------------------------------------------
V = 1
N = "one_ahead"
d = filtered.loc[filtered['Vintage'] == V]
weighted = (
(d['value'] * d['weight']).groupby(d['date']).sum()
/ d.groupby(d['date'])['weight'].sum()
).reset_index(name=N).set_index('date')
df = df.merge(weighted, left_index=True, right_index=True, how='left')
# ---------------------------------------------------
V = 2
N = "two_ahead"
d = filtered.loc[filtered['Vintage'] == V]
weighted = (
(d['value'] * d['weight']).groupby(d['date']).sum()
/ d.groupby(d['date'])['weight'].sum()
).reset_index(name=N).set_index('date')
df = df.merge(weighted, left_index=True, right_index=True, how='left')
df
| actuals | two_back | one_back | zero_back | one_ahead | two_ahead | |
|---|---|---|---|---|---|---|
| date | ||||||
| 2023-06-01 | -0.066758 | 0.121110 | 0.260364 | -0.108956 | 0.290239 | 0.507318 |
| 2023-09-01 | 1.988200 | 0.129354 | 0.101427 | 0.095867 | 0.031705 | 0.458202 |
| 2023-12-01 | -0.778219 | 0.115386 | 0.432420 | 0.643221 | 0.705854 | 0.571956 |
| 2024-03-01 | -0.054704 | 0.200174 | 0.359627 | -0.016257 | -0.148880 | -0.032345 |
| 2024-06-01 | -0.997453 | 0.022680 | 0.080296 | 0.274846 | 1.075381 | 0.317263 |
| 2024-09-01 | -1.055556 | -0.008797 | -0.301066 | -0.384276 | 0.068972 | 0.107290 |
| 2024-12-01 | 1.691583 | 0.080847 | 0.228743 | 0.387874 | 0.125630 | 0.164198 |
| 2025-03-01 | 1.086629 | -0.123339 | -0.125767 | -0.160577 | 0.274556 | 0.131708 |
| 2025-06-01 | 0.255034 | -0.170563 | 0.192939 | 0.253161 | 0.364079 | 0.389159 |
| 2025-09-01 | 0.350465 | 0.081047 | 0.159874 | 0.029010 | 0.505154 | 0.589125 |
| 2025-12-01 | -7.272064 | -0.154390 | -0.104595 | 0.288565 | 1.771885 | 0.338105 |
Alternatively... a loop, from start:
horizons = {
'two_back': -2,
'one_back': -1,
'zero_back': 0,
'one_ahead': 1,
'two_ahead': 2
}
filtered = filtered.copy()
filtered['date'] = pd.to_datetime(filtered['date'])
filtered['weight'] = 1 / filtered['RMSE']
df = results_table[['actuals']]
df = df.groupby('date').mean()
df.index = pd.to_datetime(df.index)
df
| actuals | |
|---|---|
| date | |
| 2023-06-01 | -0.066758 |
| 2023-09-01 | 1.988200 |
| 2023-12-01 | -0.778219 |
| 2024-03-01 | -0.054704 |
| 2024-06-01 | -0.997453 |
| 2024-09-01 | -1.055556 |
| 2024-12-01 | 1.691583 |
| 2025-03-01 | 1.086629 |
| 2025-06-01 | 0.255034 |
| 2025-09-01 | 0.350465 |
| 2025-12-01 | -7.272064 |
Compact loop version: compute the RMSE-weighted average nowcast for all five horizons at once. This replaces the step-by-step approach above and produces the same result more concisely.
horizons = {
'two_back': -2,
'one_back': -1,
'zero_back': 0,
'one_ahead': 1,
'two_ahead': 2
}
for N, V in horizons.items():
d = filtered.loc[filtered['Vintage'] == V]
weighted = (
(d['value'] * d['weight']).groupby(d['date']).sum()
/ d.groupby(d['date'])['weight'].sum()
).reset_index(name=N).set_index('date')
df = df.merge(weighted, left_index=True, right_index=True, how='left')
df
| actuals | two_back | one_back | zero_back | one_ahead | two_ahead | |
|---|---|---|---|---|---|---|
| date | ||||||
| 2023-06-01 | -0.066758 | 0.121110 | 0.260364 | -0.108956 | 0.290239 | 0.507318 |
| 2023-09-01 | 1.988200 | 0.129354 | 0.101427 | 0.095867 | 0.031705 | 0.458202 |
| 2023-12-01 | -0.778219 | 0.115386 | 0.432420 | 0.643221 | 0.705854 | 0.571956 |
| 2024-03-01 | -0.054704 | 0.200174 | 0.359627 | -0.016257 | -0.148880 | -0.032345 |
| 2024-06-01 | -0.997453 | 0.022680 | 0.080296 | 0.274846 | 1.075381 | 0.317263 |
| 2024-09-01 | -1.055556 | -0.008797 | -0.301066 | -0.384276 | 0.068972 | 0.107290 |
| 2024-12-01 | 1.691583 | 0.080847 | 0.228743 | 0.387874 | 0.125630 | 0.164198 |
| 2025-03-01 | 1.086629 | -0.123339 | -0.125767 | -0.160577 | 0.274556 | 0.131708 |
| 2025-06-01 | 0.255034 | -0.170563 | 0.192939 | 0.253161 | 0.364079 | 0.389159 |
| 2025-09-01 | 0.350465 | 0.081047 | 0.159874 | 0.029010 | 0.505154 | 0.589125 |
| 2025-12-01 | -7.272064 | -0.154390 | -0.104595 | 0.288565 | 1.771885 | 0.338105 |
Graph results¶
plot_df = df.rename(columns={
'actuals': 'observed',
'two_back': 'p=-2',
'one_back': 'p=-1',
'zero_back': 'p=0',
'one_ahead': 'p=1',
'two_ahead': 'p=2'
})
plot_df.index = pd.PeriodIndex(plot_df.index, freq='Q').astype(str)
fig, ax = plt.subplots(figsize=(10,6))
# Plot 'observed' as a black dashed line, no marker
plot_df['observed'].plot(
ax=ax, color='black', linestyle='-', linewidth=2, label='observed'
)
# Define line styles and markers for other lines
line_styles = ['--', '--', '-.', ':', (0, (3, 5, 1, 5))] # various dash styles
markers = ['x', 'o', '^', 'v', 'D'] # circle, square, triangle up/down, diamond
# Plot the rest with unique line styles and markers
for idx, col in enumerate(['p=-2', 'p=-1', 'p=0', 'p=1', 'p=2']):
plot_df[col].plot(
ax=ax,
color=idb_blue,
linestyle=line_styles[idx],
marker=markers[idx],
linewidth=1.5,
markersize=5,
label=col
)
# Labels and grid
ax.set_title('')
ax.set_xlabel('Quarter', fontsize=12)
ax.set_ylabel('QoQ GDP Growth (%)', fontsize=12)
ax.legend(title='Publication Lag', fontsize=12, title_fontsize=12)
plt.tight_layout()
plt.show()
Weighted Nowcast with Confidence Intervals¶
df = results_table[['actuals']].groupby('date').mean()
df
| actuals | |
|---|---|
| date | |
| 2023-06-01 | -0.066758 |
| 2023-09-01 | 1.988200 |
| 2023-12-01 | -0.778219 |
| 2024-03-01 | -0.054704 |
| 2024-06-01 | -0.997453 |
| 2024-09-01 | -1.055556 |
| 2024-12-01 | 1.691583 |
| 2025-03-01 | 1.086629 |
| 2025-06-01 | 0.255034 |
| 2025-09-01 | 0.350465 |
| 2025-12-01 | -7.272064 |
Compute weighted average nowcasts and 95% confidence intervals for all horizons. For each horizon we calculate the weighted mean, its standard error (SE), and the interval mean +/- 1.96 x SE.
lags = {
-2: 'two_back',
-1: 'one_back',
0: 'zero_back',
1: 'one_ahead',
2: 'two_ahead'
}
for V, N in lags.items():
# Read and prepare results_table fresh in each loop
New_Table = filtered.copy()
d = New_Table.loc[New_Table['Vintage'] == V]
g = d.groupby('date')
meanN = (
(d['value'] * d['weight']).groupby(d['date']).sum()
/ g['weight'].sum()
).rename(N)
seN = (1 / np.sqrt(g['weight'].sum())).rename(f'se_{N}')
weighted = pd.concat([meanN, seN], axis=1)
# Compute 95% confidence interval bounds
weighted[f'ci_lower_{N}'] = weighted[N] - 1.96 * weighted[f'se_{N}']
weighted[f'ci_upper_{N}'] = weighted[N] + 1.96 * weighted[f'se_{N}']
# Concatenate the new columns to the main df
df = df.join(weighted[[N, f'se_{N}', f'ci_lower_{N}', f'ci_upper_{N}']])
df.index = pd.PeriodIndex(df.index, freq='Q').astype(str)
df
| actuals | two_back | se_two_back | ci_lower_two_back | ci_upper_two_back | one_back | se_one_back | ci_lower_one_back | ci_upper_one_back | zero_back | ... | ci_lower_zero_back | ci_upper_zero_back | one_ahead | se_one_ahead | ci_lower_one_ahead | ci_upper_one_ahead | two_ahead | se_two_ahead | ci_lower_two_ahead | ci_upper_two_ahead | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| date | |||||||||||||||||||||
| 2023Q2 | -0.066758 | 0.121110 | 0.516343 | -0.890923 | 1.133142 | 0.260364 | 0.448102 | -0.617915 | 1.138644 | -0.108956 | ... | -1.009857 | 0.791945 | 0.290239 | 0.501652 | -0.693000 | 1.273478 | 0.507318 | 0.565639 | -0.601334 | 1.615970 |
| 2023Q3 | 1.988200 | 0.129354 | 0.516343 | -0.882678 | 1.141387 | 0.101427 | 0.448102 | -0.776852 | 0.979707 | 0.095867 | ... | -0.805034 | 0.996768 | 0.031705 | 0.501652 | -0.951534 | 1.014944 | 0.458202 | 0.565639 | -0.650450 | 1.566854 |
| 2023Q4 | -0.778219 | 0.115386 | 0.516343 | -0.896646 | 1.127419 | 0.432420 | 0.448102 | -0.445860 | 1.310699 | 0.643221 | ... | -0.257680 | 1.544122 | 0.705854 | 0.501652 | -0.277385 | 1.689092 | 0.571956 | 0.565639 | -0.536696 | 1.680608 |
| 2024Q1 | -0.054704 | 0.200174 | 0.516343 | -0.811858 | 1.212207 | 0.359627 | 0.448102 | -0.518653 | 1.237906 | -0.016257 | ... | -0.917158 | 0.884644 | -0.148880 | 0.501652 | -1.132119 | 0.834358 | -0.032345 | 0.565639 | -1.140997 | 1.076307 |
| 2024Q2 | -0.997453 | 0.022680 | 0.516343 | -0.989353 | 1.034712 | 0.080296 | 0.448102 | -0.797984 | 0.958575 | 0.274846 | ... | -0.626055 | 1.175747 | 1.075381 | 0.501652 | 0.092143 | 2.058620 | 0.317263 | 0.565639 | -0.791388 | 1.425915 |
| 2024Q3 | -1.055556 | -0.008797 | 0.516343 | -1.020829 | 1.003235 | -0.301066 | 0.448102 | -1.179345 | 0.577214 | -0.384276 | ... | -1.285177 | 0.516625 | 0.068972 | 0.501652 | -0.914267 | 1.052211 | 0.107290 | 0.565639 | -1.001362 | 1.215942 |
| 2024Q4 | 1.691583 | 0.080847 | 0.516343 | -0.931186 | 1.092879 | 0.228743 | 0.448102 | -0.649536 | 1.107023 | 0.387874 | ... | -0.513027 | 1.288775 | 0.125630 | 0.501652 | -0.857609 | 1.108868 | 0.164198 | 0.565639 | -0.944453 | 1.272850 |
| 2025Q1 | 1.086629 | -0.123339 | 0.516343 | -1.135371 | 0.888693 | -0.125767 | 0.448102 | -1.004047 | 0.752512 | -0.160577 | ... | -1.061478 | 0.740324 | 0.274556 | 0.501652 | -0.708682 | 1.257795 | 0.131708 | 0.565639 | -0.976944 | 1.240360 |
| 2025Q2 | 0.255034 | -0.170563 | 0.516343 | -1.182595 | 0.841470 | 0.192939 | 0.448102 | -0.685341 | 1.071218 | 0.253161 | ... | -0.647740 | 1.154062 | 0.364079 | 0.501652 | -0.619159 | 1.347318 | 0.389159 | 0.565639 | -0.719493 | 1.497810 |
| 2025Q3 | 0.350465 | 0.081047 | 0.516343 | -0.930985 | 1.093080 | 0.159874 | 0.448102 | -0.718405 | 1.038154 | 0.029010 | ... | -0.871891 | 0.929911 | 0.505154 | 0.501652 | -0.478084 | 1.488393 | 0.589125 | 0.565639 | -0.519527 | 1.697777 |
| 2025Q4 | -7.272064 | -0.154390 | 0.516343 | -1.166422 | 0.857643 | -0.104595 | 0.448102 | -0.982875 | 0.773684 | 0.288565 | ... | -0.612336 | 1.189466 | 1.771885 | 0.501652 | 0.788646 | 2.755123 | 0.338105 | 0.565639 | -0.770547 | 1.446756 |
11 rows × 21 columns
Plot the weighted nowcast with confidence bands, one panel per horizon. The shaded area shows the 95% interval around the weighted average, illustrating how uncertainty changes across forecasting horizons.
lags = {
-2: 'two_back',
-1: 'one_back',
0: 'zero_back',
1: 'one_ahead',
2: 'two_ahead'
}
fig, axes = plt.subplots(nrows=5, ncols=1, figsize=(10, 20), sharex=True)
# ------------------------------------------------
# Loop over each vintage and plot
for ax, (V, N) in zip(axes, lags.items()):
# Plot actuals
ax.plot(df.index, df['actuals'], color='black', linestyle='-', label='Observed')
# Plot nowcast for vintage N
ax.plot(df.index, df[N], color=idb_blue, linestyle='--', marker='x', label=f'Weighted Nowcast')
# Plot confidence interval as a shaded area
ax.fill_between(
df.index,
df[f'ci_lower_{N}'],
df[f'ci_upper_{N}'],
color=idb_light_blue,
alpha=0.4,
label='95% CI'
)
# Titles and labels
ax.set_title(f'Information set p={V}', fontsize=12)
ax.set_ylabel('QoQ GDP Growth (%)')
ax.legend(fontsize=9, loc='upper right')
axes[-1].set_xlabel('Quarter')
plt.tight_layout()
plt.show()
Out of sample¶
import os
import pandas as pd
import numpy as np
try :
date_str
except :
date_str = "20260505"
PATH_OUTPUT = os.path.join(os.getcwd(), "..", "output/", f"{date_str}")
PATH_DATA = os.path.join(os.getcwd(), "..", "data/")
import matplotlib.ticker as ticker
import matplotlib.pyplot as plt
import seaborn as sns
'''
# Define the color scheme
idb_blue = "#005BAC"
idb_light_blue = "#7FB3E6"
# Load results
forecast_file = f"{date_str} 3_nwcst Tab fcst.xlsx"
performance_file = f"{date_str} 3_nwcst Tab perf.xlsx"
outofsample_file = f"{date_str} 3_nwcst Tab oos.xlsx"
results_table = pd.read_excel(f"{PATH_OUTPUT}/{forecast_file}", index_col='date', parse_dates=True)
results_table.index = pd.to_datetime(results_table.index, format="%Y-%m-%d")
performance_table = pd.read_excel(f"{PATH_OUTPUT}/{performance_file}", parse_dates=True)
'''
forecast = filtered.copy()
forecast = forecast.dropna(how='all', axis=0)
performance = performance_table.copy()
performance = performance.dropna(how='any', axis=0)
performance['inv_rmse'] = 1 / (performance['RMSE']** 2)
performance['weight'] = performance.groupby('Vintage')['inv_rmse'].transform(lambda x: x / x.sum())
performance.reset_index(drop=False, inplace=True)
performance = performance[['Vintage', 'estimator', 'spec', 'weight']]
performance = performance.set_index('Vintage')
performance.index = performance.index.astype(str).str.replace("-2",'two_back')
performance.index = performance.index.astype(str).str.replace("-1",'one_back')
performance.index = performance.index.astype(str).str.replace("0",'zero_back')
performance.index = performance.index.astype(str).str.replace("1",'one_ahead')
performance.index = performance.index.astype(str).str.replace("2",'two_ahead')
performance = performance.reset_index()
performance.rename(columns={'Vintage': 'horizon'}, inplace=True)
#forecast = forecast.reset_index().melt(id_vars=['date', 'actuals', 'estimator', 'spec'],
# value_vars=['two_back', 'one_back', 'zero_back', 'one_ahead', 'two_ahead'],
# var_name='Vintage',
# value_name='value')
forecast = forecast.drop(columns=['weight'], errors='ignore')
forecast = forecast.merge(performance, on=['horizon', 'estimator', 'spec'])
forecast = forecast.dropna(how='all', axis=0)
forecast = forecast[~forecast['value'].isna()]
forecast['value'] = ( ((1+ ( forecast['value']/100) )**4)-1 )*100
performance
| horizon | estimator | spec | weight | |
|---|---|---|---|---|
| 0 | two_back | OLS | lars46 | 0.074801 |
| 1 | one_back | OLS | lars46 | 0.076518 |
| 2 | zero_back | OLS | lars46 | 0.064945 |
| 3 | one_ahead | OLS | lars46 | 0.004265 |
| 4 | two_ahead | OLS | lars46 | 0.004540 |
| ... | ... | ... | ... | ... |
| 75 | two_back | LSTM | lars49 | 0.058461 |
| 76 | one_back | LSTM | lars49 | 0.063251 |
| 77 | zero_back | LSTM | lars49 | 0.066015 |
| 78 | one_ahead | LSTM | lars49 | 0.106243 |
| 79 | two_ahead | LSTM | lars49 | 0.113962 |
80 rows × 4 columns
Build the out-of-sample summary statistics. Load observed GDP growth, then compute the simple average, GBT-only, and RMSE-weighted nowcast for the most recent (out-of-sample) period.
data = pd.read_csv(f"{PATH_DATA}/data.csv", parse_dates=["date"], index_col='date')
data = data[['RGDP0000']].dropna()
from statsmodels.tsa.seasonal import seasonal_decompose
data['RGDP0000'] = seasonal_decompose(data['RGDP0000'].dropna(how='all', axis=0), model='additive', extrapolate_trend='freq', period=4).trend
'''
data = pd.read_csv(f"{PATH_FORMAT}/data_full.csv", parse_dates=["date"], index_col='date')['RGDP0000'].dropna()
'''
gdp = data.resample('Q').agg({'RGDP0000': 'mean'})
gdp.index = gdp.index.to_period('Q').asfreq('M').to_timestamp()
gdp = (gdp.pct_change(periods=1)).dropna()*100
gdp.rename(columns = { 'RGDP0000' : 'actuals' } , inplace=True)
gdp['actuals'] = (((1+gdp['actuals']/100)**4) - 1)*100
actuals = pd.DataFrame(gdp['actuals'].groupby(gdp.index).mean())
avg = pd.DataFrame(forecast.groupby('date')['value'].mean()).rename(columns = {'value' : 'avg'})
est_fcst = pd.DataFrame( forecast.groupby(['date', 'estimator'])['value'].mean() ).rename(columns = {'value' : 'avg'}).reset_index()
gbt_fcst = est_fcst[est_fcst['estimator']=='GBT'].set_index('date')[['avg']].rename(columns = {'avg' : 'GBT'})
avg = pd.concat([avg, gbt_fcst], axis=1)
wavg = forecast.groupby('date').apply(lambda g: np.average(g['value'], weights=g['weight']) if g['weight'].sum() > 0 else np.nan).reset_index(name='weighted_avg').set_index('date')
avg = pd.concat([avg, wavg], axis=1)
avg.index = pd.to_datetime(avg.index)
avg
C:\Users\guerr\AppData\Local\Temp\ipykernel_31188\1029502603.py:10: FutureWarning: 'Q' is deprecated and will be removed in a future version, please use 'QE' instead.
gdp = data.resample('Q').agg({'RGDP0000': 'mean'})
C:\Users\guerr\AppData\Local\Temp\ipykernel_31188\1029502603.py:25: FutureWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning.
wavg = forecast.groupby('date').apply(lambda g: np.average(g['value'], weights=g['weight']) if g['weight'].sum() > 0 else np.nan).reset_index(name='weighted_avg').set_index('date')
| avg | GBT | weighted_avg | |
|---|---|---|---|
| date | |||
| 2023-06-01 | 0.670382 | 2.252378 | 1.055284 |
| 2023-09-01 | 0.411530 | 1.445939 | 0.812534 |
| 2023-12-01 | 2.226002 | 2.414542 | 1.994482 |
| 2024-03-01 | 0.354004 | 2.142580 | 0.214557 |
| 2024-06-01 | 2.016769 | 1.439138 | 1.321492 |
| 2024-09-01 | -0.407287 | 0.949355 | -0.387306 |
| 2024-12-01 | 0.690723 | 2.106194 | 0.909163 |
| 2025-03-01 | 0.067927 | 1.357796 | 0.043184 |
| 2025-06-01 | 0.862448 | 2.065746 | 0.950449 |
| 2025-09-01 | 1.123059 | 2.920250 | 1.178173 |
| 2025-12-01 | 3.148724 | 1.637126 | 1.443217 |
Load the out-of-sample prediction panel and filter it by quality. We read all model predictions for the nowcast horizon, convert to annualized growth rates, and keep only models that pass the same RMSE threshold used in-sample.
out_of_sample = pd.read_excel(f"{PATH_OUTPUT}/{outofsample_file}", parse_dates=True)
out_of_sample['date'] = pd.to_datetime(out_of_sample['date'])
out_of_sample['nowcast'] = (((1+out_of_sample['nowcast']/100)**4) - 1)*100
# Filter out of sample by RMSE
filtered_grouped = (
filtered[filtered['Vintage'] == 2]
.groupby(['spec', 'estimator'], as_index=False)
.agg({
'RMSE': 'mean',
#'threshold': 'mean'
})
)
out_of_sample = out_of_sample.merge(
filtered_grouped,
on=['spec', 'estimator'],
how='left'
)
out_of_sample.dropna(subset=['RMSE'], inplace=True)
#out_of_sample = out_of_sample[out_of_sample['RMSE'] <= out_of_sample['threshold']]
out_of_sample
| date | nowcast | estimator | spec | RMSE | |
|---|---|---|---|---|---|
| 4 | 2026-03-01 | 11.655263 | DT | lars46 | 2.549988 |
| 5 | 2026-03-01 | 1.608246 | RF | lars46 | 2.564108 |
| 6 | 2026-03-01 | 2.647425 | GBT | lars46 | 2.554940 |
| 11 | 2026-03-01 | 11.655263 | DT | lars49 | 2.566734 |
| 12 | 2026-03-01 | 2.722249 | RF | lars49 | 2.559070 |
| 13 | 2026-03-01 | 2.416893 | GBT | lars49 | 2.599546 |
Compute the weighted average and confidence interval for the out-of-sample nowcast. Models are weighted by their historical RMSE, and a 5-95 percentile band is calculated across model predictions.
#lower_bound = out_of_sample['nowcast'].quantile(0.10)
#upper_bound = out_of_sample['nowcast'].quantile(0.90)
#out_of_sample['nowcast'] = out_of_sample['nowcast'].where((out_of_sample['nowcast'] >= lower_bound) & (out_of_sample['nowcast'] <= upper_bound), np.nan)
'''
quantiles = out_of_sample.groupby('date')['nowcast'].quantile([0.10, 0.90]).unstack()
out_of_sample = out_of_sample.merge(quantiles, on='date', suffixes=('', '_quantile'))
out_of_sample['nowcast'] = out_of_sample['nowcast'].where(
(out_of_sample['nowcast'] >= out_of_sample[0.10]) &
(out_of_sample['nowcast'] <= out_of_sample[0.90]),
np.nan
)
out_of_sample = out_of_sample.drop(columns=[0.10, 0.90])
'''
out_of_sample = out_of_sample.merge(pd.DataFrame(performance.groupby(['estimator', 'spec'])['weight'].mean()).reset_index(), on=['estimator', 'spec']).set_index('date')
oos_avg = pd.DataFrame(out_of_sample.groupby(out_of_sample.index)['nowcast'].mean()).rename(columns = {'nowcast' : 'avg'})
oos_gbt = pd.DataFrame(out_of_sample.groupby(['date', 'estimator'])['nowcast'].mean().reset_index()).set_index('date').rename(columns = {'nowcast' : 'GBT'})
oos_gbt = pd.DataFrame(oos_gbt[oos_gbt['estimator'] == 'GBT']['GBT'])
oos_avg = pd.concat([oos_avg, oos_gbt], axis=1)
out_of_sample = out_of_sample.dropna(how='any', axis=0)
wavg = out_of_sample.groupby(out_of_sample.index).apply(lambda g: np.average(g['nowcast'], weights=g['weight']) if g['weight'].sum() > 0 else np.nan).reset_index(name='weighted_avg').set_index('date')
oos_avg = pd.concat([oos_avg, wavg], axis=1)
out_of_sample = out_of_sample.dropna(how='any', axis=0)
dfrange = out_of_sample.groupby(out_of_sample.index)['nowcast'].agg(min='min', max='max').reset_index().set_index('date')
df_ci = (
out_of_sample
.groupby(out_of_sample.index)['nowcast']
.quantile([0.05, 0.95])
.unstack()
.rename(columns={0.05: 'ci_lower', 0.95: 'ci_upper'})
)
oos_avg = pd.concat([oos_avg, dfrange, df_ci], axis=1)
oos_avg.index = pd.to_datetime(oos_avg.index)
oos_avg
| avg | GBT | weighted_avg | min | max | ci_lower | ci_upper | |
|---|---|---|---|---|---|---|---|
| date | |||||||
| 2026-03-01 | 5.45089 | 2.532159 | 5.453093 | 1.608246 | 11.655263 | 1.810408 | 11.655263 |
Combine in-sample and out-of-sample nowcast summaries into a single table, merging columns with the same name.
data = pd.concat( [ avg , oos_avg ] , axis=1 )
data = data.groupby(level=0, axis=1).sum(min_count=1)
data
C:\Users\guerr\AppData\Local\Temp\ipykernel_31188\718950512.py:2: FutureWarning: DataFrame.groupby with axis=1 is deprecated. Do `frame.T.groupby(...)` without axis instead. data = data.groupby(level=0, axis=1).sum(min_count=1)
| GBT | avg | ci_lower | ci_upper | max | min | weighted_avg | |
|---|---|---|---|---|---|---|---|
| date | |||||||
| 2023-06-01 | 2.252378 | 0.670382 | NaN | NaN | NaN | NaN | 1.055284 |
| 2023-09-01 | 1.445939 | 0.411530 | NaN | NaN | NaN | NaN | 0.812534 |
| 2023-12-01 | 2.414542 | 2.226002 | NaN | NaN | NaN | NaN | 1.994482 |
| 2024-03-01 | 2.142580 | 0.354004 | NaN | NaN | NaN | NaN | 0.214557 |
| 2024-06-01 | 1.439138 | 2.016769 | NaN | NaN | NaN | NaN | 1.321492 |
| 2024-09-01 | 0.949355 | -0.407287 | NaN | NaN | NaN | NaN | -0.387306 |
| 2024-12-01 | 2.106194 | 0.690723 | NaN | NaN | NaN | NaN | 0.909163 |
| 2025-03-01 | 1.357796 | 0.067927 | NaN | NaN | NaN | NaN | 0.043184 |
| 2025-06-01 | 2.065746 | 0.862448 | NaN | NaN | NaN | NaN | 0.950449 |
| 2025-09-01 | 2.920250 | 1.123059 | NaN | NaN | NaN | NaN | 1.178173 |
| 2025-12-01 | 1.637126 | 3.148724 | NaN | NaN | NaN | NaN | 1.443217 |
| 2026-03-01 | 2.532159 | 5.450890 | 1.810408 | 11.655263 | 11.655263 | 1.608246 | 5.453093 |
Add the observed GDP growth series to the combined table, so we can compare nowcasts against actual outcomes side by side.
data = pd.concat([ data, actuals ] , axis=1)
data = pd.concat([ data, gdp ] , axis=1)
data
| GBT | avg | ci_lower | ci_upper | max | min | weighted_avg | actuals | actuals | |
|---|---|---|---|---|---|---|---|---|---|
| date | |||||||||
| 2015-06-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3.018851 | 3.018851 |
| 2015-09-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3.625360 | 3.625360 |
| 2015-12-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.640289 | 1.640289 |
| 2016-03-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3.231093 | 3.231093 |
| 2016-06-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3.894294 | 3.894294 |
| 2016-09-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.801642 | 2.801642 |
| 2016-12-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.756087 | 1.756087 |
| 2017-03-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.300944 | 2.300944 |
| 2017-06-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.832481 | 2.832481 |
| 2017-09-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.612075 | 2.612075 |
| 2017-12-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.673566 | 2.673566 |
| 2018-03-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.524350 | 2.524350 |
| 2018-06-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.359652 | 2.359652 |
| 2018-09-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.515393 | 2.515393 |
| 2018-12-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.325332 | 2.325332 |
| 2019-03-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.876144 | 1.876144 |
| 2019-06-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.295256 | 1.295256 |
| 2019-09-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.620019 | 0.620019 |
| 2019-12-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | -8.044961 | -8.044961 |
| 2020-03-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | -12.641871 | -12.641871 |
| 2020-06-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | -8.487563 | -8.487563 |
| 2020-09-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | -7.088338 | -7.088338 |
| 2020-12-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3.957524 | 3.957524 |
| 2021-03-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 10.977186 | 10.977186 |
| 2021-06-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 7.542204 | 7.542204 |
| 2021-09-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 7.974837 | 7.974837 |
| 2021-12-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 7.132166 | 7.132166 |
| 2022-03-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 7.071148 | 7.071148 |
| 2022-06-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 5.733753 | 5.733753 |
| 2022-09-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 4.120901 | 4.120901 |
| 2022-12-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3.341701 | 3.341701 |
| 2023-03-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.492350 | 2.492350 |
| 2023-06-01 | 2.252378 | 0.670382 | NaN | NaN | NaN | NaN | 1.055284 | 2.149795 | 2.149795 |
| 2023-09-01 | 1.445939 | 0.411530 | NaN | NaN | NaN | NaN | 0.812534 | 1.412103 | 1.412103 |
| 2023-12-01 | 2.414542 | 2.226002 | NaN | NaN | NaN | NaN | 1.994482 | 0.597036 | 0.597036 |
| 2024-03-01 | 2.142580 | 0.354004 | NaN | NaN | NaN | NaN | 0.214557 | -1.370031 | -1.370031 |
| 2024-06-01 | 1.439138 | 2.016769 | NaN | NaN | NaN | NaN | 1.321492 | -1.657113 | -1.657113 |
| 2024-09-01 | 0.949355 | -0.407287 | NaN | NaN | NaN | NaN | -0.387306 | 0.129506 | 0.129506 |
| 2024-12-01 | 2.106194 | 0.690723 | NaN | NaN | NaN | NaN | 0.909163 | 1.341081 | 1.341081 |
| 2025-03-01 | 1.357796 | 0.067927 | NaN | NaN | NaN | NaN | 0.043184 | 2.689039 | 2.689039 |
| 2025-06-01 | 2.065746 | 0.862448 | NaN | NaN | NaN | NaN | 0.950449 | -1.154368 | -1.154368 |
| 2025-09-01 | 2.920250 | 1.123059 | NaN | NaN | NaN | NaN | 1.178173 | 3.293706 | 3.293706 |
| 2025-12-01 | 1.637126 | 3.148724 | NaN | NaN | NaN | NaN | 1.443217 | 1.360150 | 1.360150 |
| 2026-03-01 | 2.532159 | 5.450890 | 1.810408 | 11.655263 | 11.655263 | 1.608246 | 5.453093 | NaN | NaN |
Draw the final nowcast chart. Shows observed GDP growth (black), the simple average nowcast (red dotted), and the RMSE-weighted nowcast (indigo dashed) with a grey uncertainty band. Sample is restricted to 2022 onwards.
import matplotlib.pyplot as plt
dataframe = data.copy()
# Ensure the last observed point has no interval
last_valid_idx = dataframe['actuals'].last_valid_index()
dataframe.at[last_valid_idx, 'ci_lower'] = dataframe.at[last_valid_idx, 'weighted_avg']
dataframe.at[last_valid_idx, 'ci_upper'] = dataframe.at[last_valid_idx, 'weighted_avg']
# Restrict sample
dataframe = dataframe[dataframe.index > "2022-01-01"]
plt.figure(figsize=(10, 6))
# Confidence interval band
plt.fill_between(
dataframe.index,
dataframe['ci_lower'],
dataframe['ci_upper'],
color='grey',
alpha=0.15,
edgecolor='none',
label="_nolegend_"
)
# Observed GDP
plt.plot(
dataframe.index,
dataframe['actuals'],
color='black',
marker='>',
linewidth=2,
label='Observed GDP'
)
# Average nowcast
plt.plot(
dataframe.index,
dataframe['avg'],
color='crimson',
linestyle=":",
marker='x',
markersize=5,
label='Average Nowcast'
)
# Weighted nowcast
plt.plot(
dataframe.index,
dataframe['weighted_avg'],
color='indigo',
linestyle="--",
marker='x',
markersize=8,
label='RMSE-weighted Nowcast'
)
# Zero line
plt.axhline(y=0, color='black', linestyle='--', linewidth=1)
# Labels
plt.xlabel('Date', fontsize=12)
plt.ylabel('Real GDP Annualized Growth Rate (QoQ%)', fontsize=12)
# Rotate x labels
plt.xticks(rotation=45)
# Remove duplicate legend entries (safety)
handles, labels = plt.gca().get_legend_handles_labels()
by_label = dict(zip(labels, handles))
plt.legend(by_label.values(), by_label.keys(), loc='lower left')
# Layout and save
plt.tight_layout()
plt.savefig(f"{PATH_OUTPUT}/{date_str} 4_rslt Fig out_of_sample.png",
dpi=300,
bbox_inches='tight')
plt.show()