Session 3: Data

GDP Growth Nowcasting Workshop — Jamaica

Diego A. Guerrero

2026-05-01

Learning Objectives

Retrieve data from public APIs
Download website data.
Download and process Google Trends search data
Understand and download NASA Nighttime Lights satellite imagery
Apply economic data preprocessing: deflation, seasonal adjustment, growth rates

Scraping Economic Data

Accessing Data on the Web

Application Programming Interfaces (APIs)

Definition: Structured gateways to query databases and retrieve data automatically.
Examples:
- FRED API — US macro data
- World Bank Data API
- IMF API

A key is typically required. Always read the documentation.

FRED API

The fredapi library provides a Python interface to the Federal Reserve Economic Data:


import pandas as pd
from fredapi import Fred

fred = Fred(api_key='YOUR_API_KEY_HERE')
sp500 = fred.get_series('SP500')
sp500 = pd.DataFrame(sp500, columns=["SP500"])
sp500.tail(5)

Install with: conda install fredapi -y

Multiple Queries via Loop

import requests
import pandas as pd

series   = ["SP500", "DJIA", "NASDAQCOM"]
api_key  = "YOUR_API_KEY_HERE"

df = pd.DataFrame()
for serie in series:
    url = (f"https://api.stlouisfed.org/fred/series/observations"
           f"?series_id={serie}&api_key={api_key}&file_type=json")
    resp = requests.get(url).json()
    dfi  = pd.DataFrame(resp["observations"])[["value", "date"]]
    dfi.set_index("date", inplace=True)
    dfi.index = pd.to_datetime(dfi.index)
    dfi.rename(columns={"value": f"fred_{serie}"}, inplace=True)
    dfi[f"fred_{serie}"] = pd.to_numeric(dfi[f"fred_{serie}"], errors="coerce")
    df = df.join(dfi, how="outer") if not df.empty else dfi

Direct Web Requests

When no API is available, files can be retrieved via HTTP:

Simple (pandas)

import pandas as pd

url = "https://example.com/data.csv"
df = pd.read_csv(url)

With requests

import requests, pandas as pd
from io import StringIO

url = "https://example.com/data.csv"
response = requests.get(url)
if response.status_code == 200:
    df = pd.read_csv(StringIO(response.text))

Advanced Requests and Headers

Most websites block automated requests. We add browser-style headers:

import requests

hdr = {
    'User-Agent': (
        'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
        'AppleWebKit/537.36 (KHTML, like Gecko) '
        'Chrome/58.0.3029.110 Safari/537'
    )
}

url = "https://www.statinja.gov.jm"   # STATIN Jamaica
r = requests.get(url=url, verify=None, headers=hdr).content

headers — the website thinks the request comes from a browser verify=None — disables SSL certificate checking (use with care)

Parsing HTML with Beautiful Soup

from bs4 import BeautifulSoup
import requests

hdr = {"User-Agent": "Mozilla/5.0"}
url = "https://www.datazoa.com/data/table.asp?a=view&th=69E287AF0E&dzuuid=1835&uid=dzadmin"
r = requests.get(url=url, verify=None, headers=hdr).content
soup = BeautifulSoup(r, "html.parser")

# Find all downloadable CSV links
links = [
    link.get("href") for link in soup("a")
]
print(links[:5])

['/img/embed/embed.asp?sid=econaccounts/00001045&hash=69E287AF0E&rownum=1&altlabel=Total+Value+Added+at+Basic+Prices ', '/img/embed/embed.asp?sid=econaccounts/00001046&hash=69E287AF0E&rownum=2&altlabel=Agriculture+Forestry+%26+Fishing ', '/img/embed/embed.asp?sid=econaccounts/00001047&hash=69E287AF0E&rownum=3&altlabel=Mining+%26+Quarrying ', '/img/embed/embed.asp?sid=econaccounts/00001048&hash=69E287AF0E&rownum=4&altlabel=Manufacturing ', '/img/embed/embed.asp?sid=econaccounts/00001049&hash=69E287AF0E&rownum=5&altlabel=Food+Beverages+%26+Tobacco ']

Google Trends

Google Trends as an Economic Indicator

Tracks real-time public interest in search terms — useful proxy for:

Consumer demand: searches for products, services, and retail brands
Labour market: job-search activity
Tourism: hotel and travel queries from potential visitors
Uncertainty: searches for economic news terms

Key property: freely available, monthly, with near-zero publication lag.

Relevant Literature

Choi & Varian (2012) Predicting the Present with Google Trends

Google Trends predicts current unemployment, auto sales, and tourist destination visits.

Varian (2023) Nowcasting with Google Trends

Best practice: include trends data alongside traditional indicators.

Cebrian & Domenech (2024) Addressing Google Trends Inconsistencies

Inconsistencies across queries require re-scaling.

Jamaica Google Trends Keywords

Examples used in the nowcasting dataset:

jamaica jobs / jobs in jamaica
jamaica tourism / airbnb jamaica
jamaican dollar / remittance jamaica
supermarket jamaica / gracekennedy
Retail brands, restaurant chains, tech products

Downloads are available at trends.google.com — export as CSV (monthly, country = Jamaica).

Processing Google Trends Files

import glob, re, pandas as pd
import os

PATH_RAW = "../data"

csv_files = glob.glob(os.path.join(PATH_RAW, "*multi*.csv"))

df0 = pd.DataFrame()
for csv in csv_files:
    dfi = pd.read_csv(csv, skiprows=1)
    dfi.rename(columns={"Month": "date"}, inplace=True)
    dfi["date"] = pd.to_datetime(dfi["date"])
    dfi.columns = [re.sub(r"\W+", "_", c).lower().strip("_")
                   for c in dfi.columns]
    dfi.set_index("date", inplace=True)
    df0 = pd.concat([df0, dfi], axis=1)

df0.columns = ["gt_" + c for c in df0.columns]
df0.to_csv(f"{PATH_RAW}/gtrends.csv")
print("Last date:", df0.index.max())

Google Trends Data: Example

import pandas as pd, numpy as np

PATH_RAW = "../data"
df = pd.read_csv(f"{PATH_RAW}/gtrends.csv",
                 parse_dates=True, index_col="date")

# Replace zeros with NaN (zero = no data, not zero activity)
df = df.replace(0, np.nan)

# Drop columns with >75% missing
df = df.dropna(axis=1, thresh=int(0.25 * len(df)))

# Create composite index (average across all terms)
df["gt_all_mean"] = df.mean(axis=1)

df["gt_all_mean"].plot(color="#005BAC", figsize=(8, 3))

Satellite Nighttime Lights

Why Nighttime Lights?

Satellite imagery of artificial light at night provides a proxy for:

Economic activity — especially informal sector
Electrification and infrastructure
Urban growth

Available at fine spatial (500 m) and temporal (monthly) resolution. No publication lag — only processing delay.

NTL in Jamaica

NTL around Hurricane Melissa

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

df_island = pd.read_csv('../workshop_code/notebook_files/data_ntl_island.csv', parse_dates=True, index_col='date')
event_date  = pd.Timestamp("2025-10-28")
event_label = "Oct 28, 2025"

fig, ax = plt.subplots(figsize=(12, 4))

ax.plot(df_island.index, df_island["ntl_mean"],
        color="#005BAC", linewidth=1.8, label="NTL mean (nW/cm²/sr)")

ax.axvline(event_date, color="#CC0000", linewidth=1.5, linestyle="--", zorder=3)
ax.text(event_date, ax.get_ylim()[1],
        f" {event_label}", color="#CC0000",
        fontsize=9, va="top", ha="left", rotation=0)

ax.set_title("Jamaica Nighttime Lights (Island Mean)", fontsize=13, weight="bold")
ax.set_ylabel("NTL mean (nW/cm²/sr)")
ax.xaxis.set_major_locator(mdates.YearLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter("%Y"))
ax.xaxis.set_minor_locator(mdates.MonthLocator(bymonth=[4, 7, 10]))
ax.grid(axis="y", linestyle=":", alpha=0.5)
ax.legend(frameon=False)
plt.tight_layout()
plt.show()

NASA Black Marble VIIRS

Product: VNP46A3 (monthly, cloud-free composite) and VNP46A2 (daily)
Variable: Radiance in nW/cm²/sr
Coverage: global, 500 m pixels, from 2012
Access: NASA Earthdata — requires a free account and token
Python library: blackmarblepy (run in the geo environment)

Workflow Overview

Register at https://ladsweb.modaps.eosdis.nasa.gov/ — obtain a bearer token
Download the shapefile for Jamaica
Run blackmarblepy to download raster tiles (–we have an alternative process too)
Aggregate pixel values over Jamaica’s boundary
Save the data

Downloading NTL with blackmarblepy

# Run this in the geo environment: conda activate geo
import geopandas as gpd
import pandas as pd

# Load Jamaica boundary
shape = "https://geodata.ucdavis.edu/gadm/gadm4.1/shp/gadm41_JAM_shp.zip"
gdf = gpd.read_file(shape)
gdf.explore(tiles="CartoDB dark_matter", zoom=11)

Make this Notebook Trusted to load map: File -> Trust Notebook

Downloading Monthly Composite

from blackmarble import BlackMarble
import pandas as pd
import geopandas as gpd

shape = "https://geodata.ucdavis.edu/gadm/gadm4.1/shp/gadm41_JAM_shp.zip"
gdf = gpd.read_file(shape)

token = "YOUR_EARTHDATA_TOKEN_HERE"
bm = BlackMarble(token=token)

# Download VNP46A3 monthly composite
df_ntl = bm.raster(
    gdf,
    product_id="VNP46A3",
    date_range=pd.date_range("2015-01-01", "2026-01-01", freq="ME"),
)

# Aggregate: sum all pixels over Jamaica
dfx = (
    df_ntl["NearNadir_Composite_Snow_Free"]
    .sum(dim=["x", "y"])
    .rename("NearNadir_Composite_Snow_Free")
    .to_dataframe()
)
dfx.index.name = "date"
dfx.to_csv("../data/blk_ntl.csv")

Visualizing monthly NTL

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("../data/blk_ntl.csv",
                 parse_dates=True, index_col="date")

df["NearNadir_Composite_Snow_Free"].plot(
    color="#005BAC", linewidth=1.5, figsize=(9, 4)
)
plt.title("Jamaica Nighttime Lights (NTL)", fontsize=13, weight="bold")
plt.ylabel("Sum of Radiance (nW/cm²/sr)")
plt.tight_layout()
plt.show()

Economic Data Preprocessing

Three Standard Transformations

Step	Purpose	Method
Deflation	Remove price-level changes	Divide by CPI, multiply by base-year CPI
Seasonal Adjustment	Remove calendar seasonality	`seasonal_decompose` trend component
Growth rates	Achieve stationarity	Quarter-over-quarter `pct_change()`

Activity

Build the Merged Dataset

Open workshop_code/s3_build.ipynb and:

Download your very own Google Trends data and apply processing with s3b_gtrends.ipynb — replace zeros with NaN, drop sparse columns, compute gt_all_mean
Examine s3c_blackmarble.ipynb and s3c_ntl.ipynb to process h5 files into nighttime lights data.
Load data.csv and inspect the GDP series (RGDP0000)
Merge Google Trends and NTL into the main GDP DataFrame
Apply seasonal adjustment, and growth-rate transformation
Resample to quarterly
Visualize