Session 2: Python & R Fundamentals

GDP Growth Nowcasting Workshop — Jamaica

Diego A. Guerrero

2026-05-01

Learning Objectives

  • Familiarize with the basic concepts of Python and R
  • Identify and manipulate data structures
  • Run and execute Python / R scripts
  • Perform basic data uploading and cleaning
  • Compute summary statistics and visualize time series

Introduction to Python

Why Python?

  • Speed, reproducibility, flexibility, and a vast ecosystem of libraries
  • Integrates with R, SQL, Excel, Stata, and APIs
  • Scales from a small dataset to cloud-scale Big Data
  • Free — no licensing costs for government use

Key Libraries

First Python Program

print("Hello, Jamaica!")
print(2 + 5)
Hello, Jamaica!
7

Python Syntax

Variables and Types

country = "Jamaica"
gdp = 20_500      # USD millions (approximate)
growth_rate = 0.035
island = True

print(country, gdp)
print(growth_rate, island)
Jamaica 20500
0.035 True

Variables are stored in RAM — keep datasets manageable.

Data Types

a = 10           # integer
b = 3.14         # float
c = "Economics"  # string
d = True         # boolean

print(type(a), type(b), type(c), type(d))
<class 'int'> <class 'float'> <class 'str'> <class 'bool'>

Common types:

  • int — whole numbers
  • float — decimals
  • str — text
  • bool — True/False

String Formatting

country = "Jamaica"
gdp = 20500 
growth_rate = 0.035

print(f"The country {country} has a GDP of USD {gdp}M "
      f"and grew at {growth_rate*100:.1f}%.")
The country Jamaica has a GDP of USD 20500M and grew at 3.5%.

Arithmetic Operations

gdp_1 = 19890
gdp_2 = 20500

# Growth rate
growth = gdp_2 / gdp_1 - 1

print(f"GDP grew by {round(growth*100, 2)}%.")
GDP grew by 3.07%.

Conditional Statements

Python uses relational operators (>, <, >=, !=, ==, in, not) that return booleans (True/False).

growth = 0.035

if growth > 0:
    print("Economy expanding")
elif growth == 0:
    print("No growth")
else:
    print("Economy contracting")
Economy expanding

Conditional Example: GDP

gdp_growth = -1.5   # percent

if gdp_growth >= 3:
    print("Strong growth")
elif gdp_growth >= 1:
    print("Moderate growth")
elif gdp_growth >= 0:
    print("Stagnation")
else:
    print("Recession")

print("Program finished.")
Recession
Program finished.

Data Structures (Collections)

Structure Example Key Features
List [2020, 2021, 2022] Ordered, mutable
Tuple (2020, 2021, 2022) Ordered, immutable
Dictionary {"year": 2023, "gdp": 3.5} Key–value pairs
Set {2020, 2021, 2022} Unordered, unique

Loops

for — iterate over a sequence:

years = [2020, 2021, 2022, 2023, 2024]
for year in years:
    print(year)
2020
2021
2022
2023
2024

while — repeat while a condition is true:

count = 0
while count < 4:
    print(f"Quarter {count + 1}")
    count += 1
Quarter 1
Quarter 2
Quarter 3
Quarter 4

Loops and Conditionals

gdp_series = [2.1, -3.5, 1.8, 0.4, 3.2, -0.2, 2.9]

for i, g in enumerate(gdp_series):
    if g < 0:
        print(f"Q{i+1}: recession ({g}%)")
    else:
        print(f"Q{i+1}: growth ({g}%)")
Q1: growth (2.1%)
Q2: recession (-3.5%)
Q3: growth (1.8%)
Q4: growth (0.4%)
Q5: growth (3.2%)
Q6: recession (-0.2%)
Q7: growth (2.9%)

Functions

A function is a reusable block of code that performs a specific task.

def annualized_growth(q_growth):
    """Convert quarterly growth to annualized rate."""
    return ((1 + q_growth / 100) ** 4 - 1) * 100

print(f"Annualized: {annualized_growth(0.875):.2f}%")
Annualized: 3.55%

return sends a value back to the caller.

Functions and Tables

def make_growth_table(values):
    """Return a list of (quarter, growth) rows."""
    table = []
    for i, v in enumerate(values):
        table.append((f"Q{i+1}", round(v, 2)))
    return table

gdp_q = [0.8, -0.9, 1.1, 0.7]
result = make_growth_table(gdp_q)

print("Quarter | Growth%")
print("--------+--------")
for row in result:
    print(f"{row[0]:7} | {row[1]:6}")
Quarter | Growth%
--------+--------
Q1      |    0.8
Q2      |   -0.9
Q3      |    1.1
Q4      |    0.7

DataFrames

Table-like structure from the pandas library:

import pandas as pd

data = {
    "Country": ["Jamaica", "Trinidad", "Barbados", "Guyana"],
    "GDP_USD_B": [18.5, 28.1, 5.7, 26.8],
    "Population_M": [2.8, 1.4, 0.28, 0.79]
}
df = pd.DataFrame(data)
df
Country GDP_USD_B Population_M
0 Jamaica 18.5 2.80
1 Trinidad 28.1 1.40
2 Barbados 5.7 0.28
3 Guyana 26.8 0.79

Data in Python

Importing Data

import pandas as pd

# Load Jamaica data
df = pd.read_csv("../data/data.csv", parse_dates=["date"], index_col="date")
df.head(3)
RGDP0000 UMBL0000 MLIA1001 MLIA1002 MLIA1003 MLIA1004 MLIA1000 MLIA0003 MLIA0002 MLIA0004 ... UGTR0021 UGTR0006 UGTR0001 REXC0001 REXC0002 XEMP0003 XIMP0003 XEMP0004 XIMP0004 UGTR0000
date
2015-02-01 NaN 54304.080413 31890.278 135559.643 249451.351 116008.373 532909.645 52036.644 1501.182 14318.718 ... 10.0 16.0 9.0 NaN NaN NaN NaN 137.776543 25.294407 27.600
2015-03-01 376071.0 55526.300906 29976.701 127010.461 245889.630 112828.497 515705.289 56535.456 588.502 13734.467 ... 9.0 19.0 15.0 NaN NaN NaN NaN 143.741899 24.350571 27.050
2015-04-01 NaN 59615.749986 29459.217 130600.536 249975.925 124345.188 534380.866 45095.208 1278.866 14176.166 ... 11.0 21.0 10.0 NaN NaN NaN NaN 149.343548 27.563384 28.325

3 rows × 992 columns

Importing Various Formats

import pandas as pd

# CSV
df = pd.read_csv("data.csv", parse_dates=["date"], index_col="date")

# Excel
df = pd.read_excel("data.xlsx", sheet_name="Sheet1")

# Text file
df = pd.read_table("data.txt", sep="\t")

Exporting

df.to_csv("output/data_clean.csv", index=True)
df.to_excel("output/data_clean.xlsx", index=False)
df.to_json("output/data_clean.json", orient="records")

json is a very efficient format in data science that stores records as dictionaries.

Data Exploration

import pandas as pd
df = pd.read_csv("../data/data.csv", parse_dates=["date"], index_col="date")
df[["RGDP0000", "UGTR0000", "UMBL0000"]].tail(6)

Data Exploration

RGDP0000 UGTR0000 UMBL0000
date
2025-11-01 NaN 50.525 52926.440532
2025-12-01 406765.0 49.650 74554.652345
2026-01-01 NaN 46.125 NaN
2026-02-01 NaN NaN NaN
2026-03-01 NaN NaN NaN
2026-04-01 NaN NaN NaN

DataFrame Summary

import pandas as pd
df = pd.read_csv("../data/data.csv", parse_dates=["date"], index_col="date")

print("=== Info ===")
print(df.info())           # column names, dtypes, missing values

print("=== Summary stats ===")
print(df.describe())       # numeric summary statistics

print("=== Shape ===")
print(df.shape)            # (rows, columns)

print("=== Column names ===")
print(df.columns.tolist()[:10])   # first 10 column names

print("=== Data types ===")
print(df.dtypes)

GDP Series

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("../data/data.csv", parse_dates=["date"], index_col="date")
gdp = df["RGDP0000"].dropna()

gdp.plot(color="#005BAC", linewidth=2, figsize=(7, 4))
plt.title("Jamaica Real GDP", fontsize=14, weight="bold")
plt.ylabel("JMD Millions")
plt.xlabel("")
plt.tight_layout()
plt.show()

Selecting and Filtering

import pandas as pd
df = pd.read_csv("../data/data.csv", parse_dates=["date"], index_col="date")

# Select one column
gdp = df["RGDP0000"].dropna()

# Filter to post-2018
recent = gdp[gdp.index >= "2018-01-01"]

# Describe
recent.describe()
count        32.000000
mean     410715.406250
std       22563.746281
min      338323.000000
25%      400882.000000
50%      408963.000000
75%      428480.000000
max      438665.000000
Name: RGDP0000, dtype: float64

Complex Filtering

import pandas as pd
df = pd.read_csv("../data/data.csv", parse_dates=["date"], index_col="date")
gdp = df["RGDP0000"].dropna()

# Keep only recession quarters (negative growth)
gdp_growth = gdp.pct_change() * 100
recession = gdp_growth[gdp_growth < 0]
print(recession)
date
2015-06-01    -1.270239
2015-09-01    -0.988166
2020-03-01    -0.591655
2020-06-01   -16.689321
2022-12-01    -0.033343
2023-06-01    -0.066758
2023-12-01    -0.778219
2024-03-01    -0.054704
2024-06-01    -0.997453
2024-09-01    -1.055556
2025-12-01    -7.272064
Name: RGDP0000, dtype: float64

Combining Datasets: concat, merge, join

Method Use Based On Example Notes
pd.concat() Stack rows or columns Axis pd.concat([df1, df2]) Simple append
pd.merge() SQL-style JOIN on key columns Key columns pd.merge(df1, df2, on="date") Most flexible
df.join() Combine on index Index df1.join(df2) Convenient for time series

Merge Example

import pandas as pd

df = pd.read_csv("../data/data.csv", parse_dates=["date"], index_col="date")

# Select GDP and Google Trends index
gdp   = df[["RGDP0000"]].dropna()
gtrends = df[["UGTR0000"]].dropna()

# Merge on date index
merged = gdp.join(gtrends, how="inner")
merged.tail(6)
RGDP0000 UGTR0000
date
2024-09-01 424159.0 47.950
2024-12-01 431334.0 48.275
2025-03-01 436021.0 47.875
2025-06-01 437133.0 45.775
2025-09-01 438665.0 50.175
2025-12-01 406765.0 49.650

Reshaping

wide → long: melt()

long_df = df.melt(id_vars="date",
                  value_vars=["RGDP0000", "UGTR0000"],
                  var_name="variable",
                  value_name="value")
long_df.head()

long → wide: pivot()

wide_df = long_df.pivot(index="date", columns="variable", values="value")
wide_df.head()

Resample

Transforms data to a different frequency (e.g. monthly → quarterly):

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("../data/data.csv", parse_dates=["date"], index_col="date")

# Google Trends — monthly
gt_monthly = df["UGTR0000"].dropna()

# Resample to quarterly mean
gt_quarterly = gt_monthly.resample("QS").mean()

plt.figure(figsize=(8, 4))
plt.plot(gt_monthly.index, gt_monthly, alpha=0.4, label="Monthly")
plt.plot(gt_quarterly.index, gt_quarterly, color="red",
         linewidth=2, label="Quarterly")
plt.legend(frameon=False)
plt.title("Google Trends Index — Monthly vs Quarterly")
plt.show()

Resample Frequencies

Code Frequency Example Date
"D" Daily 2026-01-01
"W" Week-end 2026-01-05
"MS" Month-start 2026-01-01
"QS" Quarter-start 2026-01-01
"AS" Year-start 2026-01-01

Activity

Loading and Visualizing Jamaica Data

Open workshop_code/s2_basic.ipynb and:

  1. Load two datasets gdp.csv and sp500.csv
  2. Perform basic cleaning and merge
  3. Visualize
  4. Compute growth rates for each variable

Appendix — Visualization

Line Plot

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("../data/data.csv", parse_dates=["date"], index_col="date")
gdp = df["RGDP0000"].dropna()
gdp_g = gdp.pct_change() * 100

fig, ax = plt.subplots(figsize=(8, 5))
ax.plot(gdp_g.index, gdp_g, color="#005BAC", linewidth=2)
ax.axhline(0, color="grey", linestyle="--", linewidth=0.8)
ax.set_title("Jamaica Real GDP Growth (QoQ%)", fontsize=14, weight="bold")
ax.set_ylabel("Percent")
plt.tight_layout()
plt.show()

Bar Chart

import matplotlib.pyplot as plt

sectors = ["Agriculture", "Manufacturing", "Tourism", "Finance", "Other"]
shares  = [6, 9, 30, 18, 37]

fig, ax = plt.subplots(figsize=(7, 4))
ax.bar(sectors, shares,
       color=["#1f77b4", "#ff7f0e", "#2ca02c", "#9467bd", "#d62728"])
ax.set_title("Jamaica GDP by Sector (approx.)", fontsize=13, weight="bold")
ax.set_ylabel("Percent of GDP")
ax.grid(axis="y", alpha=0.3)
plt.tight_layout()
plt.show()

Scatter Plot

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("../data/data.csv", parse_dates=["date"], index_col="date")
df_q = df[["RGDP0000", "UGTR0000"]].resample("QS").mean()
df_q = df_q.pct_change().dropna() * 100

fig, ax = plt.subplots(figsize=(7, 5))
ax.scatter(df_q["UGTR0000"], df_q["RGDP0000"],
           s=70, color="#005BAC", alpha=0.7, edgecolors="white")
ax.set_title("Google Trends vs GDP Growth", fontsize=13, weight="bold")
ax.set_xlabel("Google Trends Growth (%)")
ax.set_ylabel("GDP Growth (%)")
ax.grid(alpha=0.3)
plt.tight_layout()
plt.show()