Session 3: Data

GDP Growth Nowcasting Workshop — Jamaica

Diego A. Guerrero

2026-05-01

Learning Objectives

  • Retrieve data from public APIs
  • Download website data.
  • Download and process Google Trends search data
  • Understand and download NASA Nighttime Lights satellite imagery
  • Apply economic data preprocessing: deflation, seasonal adjustment, growth rates

Scraping Economic Data

Accessing Data on the Web

Application Programming Interfaces (APIs)

A key is typically required. Always read the documentation.

FRED API

The fredapi library provides a Python interface to the Federal Reserve Economic Data:


import pandas as pd
from fredapi import Fred

fred = Fred(api_key='YOUR_API_KEY_HERE')
sp500 = fred.get_series('SP500')
sp500 = pd.DataFrame(sp500, columns=["SP500"])
sp500.tail(5)

Install with: conda install fredapi -y

Multiple Queries via Loop

import requests
import pandas as pd

series   = ["SP500", "DJIA", "NASDAQCOM"]
api_key  = "YOUR_API_KEY_HERE"

df = pd.DataFrame()
for serie in series:
    url = (f"https://api.stlouisfed.org/fred/series/observations"
           f"?series_id={serie}&api_key={api_key}&file_type=json")
    resp = requests.get(url).json()
    dfi  = pd.DataFrame(resp["observations"])[["value", "date"]]
    dfi.set_index("date", inplace=True)
    dfi.index = pd.to_datetime(dfi.index)
    dfi.rename(columns={"value": f"fred_{serie}"}, inplace=True)
    dfi[f"fred_{serie}"] = pd.to_numeric(dfi[f"fred_{serie}"], errors="coerce")
    df = df.join(dfi, how="outer") if not df.empty else dfi

Direct Web Requests

When no API is available, files can be retrieved via HTTP:

Simple (pandas)

import pandas as pd

url = "https://example.com/data.csv"
df = pd.read_csv(url)

With requests

import requests, pandas as pd
from io import StringIO

url = "https://example.com/data.csv"
response = requests.get(url)
if response.status_code == 200:
    df = pd.read_csv(StringIO(response.text))

Advanced Requests and Headers

Most websites block automated requests. We add browser-style headers:

import requests

hdr = {
    'User-Agent': (
        'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
        'AppleWebKit/537.36 (KHTML, like Gecko) '
        'Chrome/58.0.3029.110 Safari/537'
    )
}

url = "https://www.statinja.gov.jm"   # STATIN Jamaica
r = requests.get(url=url, verify=None, headers=hdr).content

headers — the website thinks the request comes from a browser verify=None — disables SSL certificate checking (use with care)

Parsing HTML with Beautiful Soup

from bs4 import BeautifulSoup
import requests

hdr = {"User-Agent": "Mozilla/5.0"}
url = "https://www.datazoa.com/data/table.asp?a=view&th=69E287AF0E&dzuuid=1835&uid=dzadmin"
r = requests.get(url=url, verify=None, headers=hdr).content
soup = BeautifulSoup(r, "html.parser")

# Find all downloadable CSV links
links = [
    link.get("href") for link in soup("a")
]
print(links[:5])
['/img/embed/embed.asp?sid=econaccounts/00001045&hash=69E287AF0E&rownum=1&altlabel=Total+Value+Added+at+Basic+Prices ', '/img/embed/embed.asp?sid=econaccounts/00001046&hash=69E287AF0E&rownum=2&altlabel=Agriculture+Forestry+%26+Fishing ', '/img/embed/embed.asp?sid=econaccounts/00001047&hash=69E287AF0E&rownum=3&altlabel=Mining+%26+Quarrying ', '/img/embed/embed.asp?sid=econaccounts/00001048&hash=69E287AF0E&rownum=4&altlabel=Manufacturing ', '/img/embed/embed.asp?sid=econaccounts/00001049&hash=69E287AF0E&rownum=5&altlabel=Food+Beverages+%26+Tobacco ']

Relevant Literature

Choi & Varian (2012) Predicting the Present with Google Trends

Google Trends predicts current unemployment, auto sales, and tourist destination visits.

Varian (2023) Nowcasting with Google Trends

Best practice: include trends data alongside traditional indicators.

Cebrian & Domenech (2024) Addressing Google Trends Inconsistencies

Inconsistencies across queries require re-scaling.

Examples used in the nowcasting dataset:

  • jamaica jobs / jobs in jamaica
  • jamaica tourism / airbnb jamaica
  • jamaican dollar / remittance jamaica
  • supermarket jamaica / gracekennedy
  • Retail brands, restaurant chains, tech products

Downloads are available at trends.google.com — export as CSV (monthly, country = Jamaica).

import glob, re, pandas as pd
import os

PATH_RAW = "../data"

csv_files = glob.glob(os.path.join(PATH_RAW, "*multi*.csv"))

df0 = pd.DataFrame()
for csv in csv_files:
    dfi = pd.read_csv(csv, skiprows=1)
    dfi.rename(columns={"Month": "date"}, inplace=True)
    dfi["date"] = pd.to_datetime(dfi["date"])
    dfi.columns = [re.sub(r"\W+", "_", c).lower().strip("_")
                   for c in dfi.columns]
    dfi.set_index("date", inplace=True)
    df0 = pd.concat([df0, dfi], axis=1)

df0.columns = ["gt_" + c for c in df0.columns]
df0.to_csv(f"{PATH_RAW}/gtrends.csv")
print("Last date:", df0.index.max())
import pandas as pd, numpy as np

PATH_RAW = "../data"
df = pd.read_csv(f"{PATH_RAW}/gtrends.csv",
                 parse_dates=True, index_col="date")

# Replace zeros with NaN (zero = no data, not zero activity)
df = df.replace(0, np.nan)

# Drop columns with >75% missing
df = df.dropna(axis=1, thresh=int(0.25 * len(df)))

# Create composite index (average across all terms)
df["gt_all_mean"] = df.mean(axis=1)

df["gt_all_mean"].plot(color="#005BAC", figsize=(8, 3))

Satellite Nighttime Lights

Why Nighttime Lights?

Satellite imagery of artificial light at night provides a proxy for:

  • Economic activity — especially informal sector
  • Electrification and infrastructure
  • Urban growth

Available at fine spatial (500 m) and temporal (monthly) resolution. No publication lag — only processing delay.

NTL in Jamaica

NTL around Hurricane Melissa

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

df_island = pd.read_csv('../workshop_code/notebook_files/data_ntl_island.csv', parse_dates=True, index_col='date')
event_date  = pd.Timestamp("2025-10-28")
event_label = "Oct 28, 2025"

fig, ax = plt.subplots(figsize=(12, 4))

ax.plot(df_island.index, df_island["ntl_mean"],
        color="#005BAC", linewidth=1.8, label="NTL mean (nW/cm²/sr)")

ax.axvline(event_date, color="#CC0000", linewidth=1.5, linestyle="--", zorder=3)
ax.text(event_date, ax.get_ylim()[1],
        f" {event_label}", color="#CC0000",
        fontsize=9, va="top", ha="left", rotation=0)

ax.set_title("Jamaica Nighttime Lights (Island Mean)", fontsize=13, weight="bold")
ax.set_ylabel("NTL mean (nW/cm²/sr)")
ax.xaxis.set_major_locator(mdates.YearLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter("%Y"))
ax.xaxis.set_minor_locator(mdates.MonthLocator(bymonth=[4, 7, 10]))
ax.grid(axis="y", linestyle=":", alpha=0.5)
ax.legend(frameon=False)
plt.tight_layout()
plt.show()

NASA Black Marble VIIRS

  • Product: VNP46A3 (monthly, cloud-free composite) and VNP46A2 (daily)
  • Variable: Radiance in nW/cm²/sr
  • Coverage: global, 500 m pixels, from 2012
  • Access: NASA Earthdata — requires a free account and token
  • Python library: blackmarblepy (run in the geo environment)

Workflow Overview

  1. Register at https://ladsweb.modaps.eosdis.nasa.gov/ — obtain a bearer token
  2. Download the shapefile for Jamaica
  3. Run blackmarblepy to download raster tiles (–we have an alternative process too)
  4. Aggregate pixel values over Jamaica’s boundary
  5. Save the data

Downloading NTL with blackmarblepy

# Run this in the geo environment: conda activate geo
import geopandas as gpd
import pandas as pd

# Load Jamaica boundary
shape = "https://geodata.ucdavis.edu/gadm/gadm4.1/shp/gadm41_JAM_shp.zip"
gdf = gpd.read_file(shape)
gdf.explore(tiles="CartoDB dark_matter", zoom=11)
Make this Notebook Trusted to load map: File -> Trust Notebook

Downloading Monthly Composite

from blackmarble import BlackMarble
import pandas as pd
import geopandas as gpd

shape = "https://geodata.ucdavis.edu/gadm/gadm4.1/shp/gadm41_JAM_shp.zip"
gdf = gpd.read_file(shape)

token = "YOUR_EARTHDATA_TOKEN_HERE"
bm = BlackMarble(token=token)

# Download VNP46A3 monthly composite
df_ntl = bm.raster(
    gdf,
    product_id="VNP46A3",
    date_range=pd.date_range("2015-01-01", "2026-01-01", freq="ME"),
)

# Aggregate: sum all pixels over Jamaica
dfx = (
    df_ntl["NearNadir_Composite_Snow_Free"]
    .sum(dim=["x", "y"])
    .rename("NearNadir_Composite_Snow_Free")
    .to_dataframe()
)
dfx.index.name = "date"
dfx.to_csv("../data/blk_ntl.csv")

Visualizing monthly NTL

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("../data/blk_ntl.csv",
                 parse_dates=True, index_col="date")

df["NearNadir_Composite_Snow_Free"].plot(
    color="#005BAC", linewidth=1.5, figsize=(9, 4)
)
plt.title("Jamaica Nighttime Lights (NTL)", fontsize=13, weight="bold")
plt.ylabel("Sum of Radiance (nW/cm²/sr)")
plt.tight_layout()
plt.show()

Economic Data Preprocessing

Three Standard Transformations

Step Purpose Method
Deflation Remove price-level changes Divide by CPI, multiply by base-year CPI
Seasonal Adjustment Remove calendar seasonality seasonal_decompose trend component
Growth rates Achieve stationarity Quarter-over-quarter pct_change()

Activity

Build the Merged Dataset

Open workshop_code/s3_build.ipynb and:

  1. Download your very own Google Trends data and apply processing with s3b_gtrends.ipynb — replace zeros with NaN, drop sparse columns, compute gt_all_mean
  2. Examine s3c_blackmarble.ipynb and s3c_ntl.ipynb to process h5 files into nighttime lights data.
  3. Load data.csv and inspect the GDP series (RGDP0000)
  4. Merge Google Trends and NTL into the main GDP DataFrame
  5. Apply seasonal adjustment, and growth-rate transformation
  6. Resample to quarterly
  7. Visualize