fin-glassbox

StemGNN Contagion Risk Module

Project: Explainable Multimodal Neural Framework for Financial Risk Management
Module: GNN Contagion Risk (StemGNN)
File: code/gnn/stemgnn_contagion.py


Table of Contents

  1. Overview
  2. Architecture
  3. Contagion Score Definition
  4. Input Data
  5. Output Data
  6. Model Specifications
  7. Training Protocol
  8. Hyperparameter Optimization
  9. XAI Integration
  10. GPU Optimizations
  11. Dependencies
  12. Usage
  13. File Structure
  14. Validation Metrics

1. Overview

What It Does

The StemGNN Contagion Risk Module answers the question: “If other stocks crash, how badly will this stock get hurt?”

It uses a Spectral Temporal Graph Neural Network (StemGNN) to learn how financial distress propagates through the market. For each of 2,500 stocks, it outputs a contagion probability score (0-1) at three horizons: 5 days, 20 days, and 60 days.

Why StemGNN

StemGNN was chosen over simpler approaches (correlation matrices, linear models) because:

Problem Why StemGNN Solves It
Non-linear contagion Crash propagation is non-linear — StemGNN’s spectral convolutions capture this
Hidden relationships Latent correlation layer learns connections that simple correlation misses
Multi-hop propagation Supplier→MSFT→AAPL: 2-hop paths that correlation can’t see
Temporal dynamics Contagion changes over time — Fourier transforms capture cyclical patterns
Joint learning Learns graph structure AND temporal patterns simultaneously, not separately

Research Foundation

This module builds on two peer-reviewed papers:

  1. Cao et al. (2020) — “Spectral Temporal Graph Neural Network for Multivariate Time-series Forecasting” (NeurIPS 2020). The original StemGNN architecture.

  2. Uygun & Sefer (2025) — “Financial asset price prediction with graph neural network-based temporal deep learning models” (Neural Computing and Applications). Validated StemGNN on cryptocurrency and Forex markets, achieving 9.29% MAPE.

Our adaptation: Instead of predicting future prices, we predict contagion-driven extreme negative returns — a risk-focused reformulation.


2. Architecture

High-Level Architecture

returns_panel_wide.csv (2,500 stocks × 6,285 days)
        │
        ▼
┌──────────────────────────────────────┐
│         ContagionDataset             │
│  - Sliding windows (30 days)         │
│  - Builds binary contagion targets   │
│  - Vectorized (no per-stock loops)   │
└──────────────────────────────────────┘
        │
        ▼
┌──────────────────────────────────────┐
│       ContagionStemGNN               │
│  ┌────────────────────────────────┐  │
│  │  StemGNN Base (baseline code)  │  │
│  │  - Latent Correlation Layer    │  │
│  │  - Spectral-Temporal Blocks    │  │
│  │  - Graph Fourier Transform     │  │
│  │  - Discrete Fourier Transform  │  │
│  └────────────────────────────────┘  │
│  ┌────────────────────────────────┐  │
│  │  Contagion Heads (3 horizons)  │  │
│  │  - 5-day contagion probability │  │
│  │  - 20-day contagion probability│  │
│  │  - 60-day contagion probability│  │
│  └────────────────────────────────┘  │
└──────────────────────────────────────┘
        │
        ▼
┌──────────────────────────────────────┐
│         XAI Extraction               │
│  Level 1: Adjacency + Top Influencers│
│  Level 2: Gradient Edge Importance   │
│  Level 3: GNNExplainer (opt-in)      │
└──────────────────────────────────────┘

Baseline Model Internals (from Cao et al. 2020)

The StemGNN core operates in the spectral domain:

  1. Latent Correlation Layer: Uses GRU + self-attention to learn an N×N adjacency matrix from the data. No predefined graph needed — it discovers which stocks influence each other.

  2. Graph Fourier Transform (GFT): Projects the multivariate time series onto eigenvectors of the graph Laplacian. This decomposes the signal into orthogonal components — different “market modes.”

  3. Discrete Fourier Transform (DFT): Transforms each component into the frequency domain. Captures cyclical patterns (weekly, monthly, quarterly).

  4. Spectral-Temporal Blocks: 1D convolutions + Gated Linear Units (GLU) learn patterns in the frequency domain. The spectral representation is cleaner and easier to predict than raw time-domain data.

  5. Inverse Transforms: IDFT → IGFT to return to the original space.

  6. Stack of 2 Blocks: The second block learns residuals from the first, connected by skip connections.

Contagion-Specific Modifications

The baseline StemGNN predicts future price values. We replace its final fc layer with 3 independent contagion heads:

stemgnn_output: (batch, num_nodes, window_size)
        │
        ├──→ Head_5d: Linear(30→15) → LeakyReLU → Dropout → Linear(15→1) → Sigmoid
        ├──→ Head_20d: Linear(30→15) → LeakyReLU → Dropout → Linear(15→1) → Sigmoid
        └──→ Head_60d: Linear(30→15) → LeakyReLU → Dropout → Linear(15→1) → Sigmoid

Each head outputs a probability [0,1] that the stock will experience a contagion-driven extreme negative return at that horizon.


3. Contagion Score Definition

Mathematical Definition

For stock i at time t:

contagion_score_h[i] = P(
    return_i[t:t+h] < threshold_i  AND  |return_i - E[return_i]| > 2 × σ_i
)

Where:

In plain English: A stock gets a high contagion score if:

  1. It suffers an extreme negative return (bottom 5% of its own history), AND
  2. This negative return is much worse than what its own recent behavior would predict

The second condition removes idiosyncratic drops (bad earnings → stock drops alone) from contagion drops (market selloff → stock gets dragged down).

Why Binary Targets?

Binary targets (contagion event: yes/no) are used because:

Multi-Horizon Design

Horizon What It Captures Trading Use
5 days Immediate spillover risk Short-term position adjustment
20 days Medium-term contagion Monthly rebalancing decisions
60 days Systemic risk buildup Strategic allocation changes

Longer horizons are inherently harder to predict but capture slower-building systemic risks (like the 2008 financial crisis, which unfolded over months).


4. Input Data

Primary Input

File Description
data/yFinance/processed/returns_panel_wide.csv Log returns matrix

Input Format

date,A,AA,AAL,...,ZTS
2000-01-04,0.012,-0.008,0.003,...,-0.005
2000-01-05,0.014,-0.006,0.001,...,0.002
...
2024-12-27,0.008,0.011,-0.002,...,0.007
Property Value
Shape 6,285 rows × 2,501 columns (date + 2,500 tickers)
Tickers 2,500 U.S. stocks
Date range 2000-01-04 to 2024-12-27
Frequency Daily (NYSE trading days only)
Return type Log returns: ln(close_t / close_t-1)
NaN rate 0.0%
File size ~277 MB

Data Preprocessing

The ContagionDataset class performs these steps automatically:

  1. Sliding windows: Extracts 30-day windows of returns for all 2,500 stocks
  2. Target construction: Computes binary contagion labels per stock per horizon
  3. Normalization: Z-score normalization fitted on training data only
  4. Memory efficiency: Windows stored as numpy arrays, loaded on-demand by DataLoader

Memory Requirements

Component Memory (approx)
Returns matrix (full) ~125 MB (float32)
Training windows (1,200 windows) ~720 MB
Model parameters (61M) ~244 MB
Adjacency matrix (2,500×2,500) ~50 MB (float32)
Total VRAM (training) ~3-4 GB
GPU required RTX 3090 Ti (24GB) — well within limits

5. Output Data

Contagion Scores

File Description
outputs/results/StemGNN/contagion_scores_chunk{N}_{split}.csv Per-stock contagion probabilities

Format:

ticker,contagion_5d,contagion_20d,contagion_60d
AAPL,0.12,0.18,0.31
MSFT,0.09,0.15,0.28
...
Column Range Description
ticker string Stock symbol
contagion_5d 0.0-1.0 Probability of contagion-driven extreme loss in 5 trading days
contagion_20d 0.0-1.0 Probability in 20 trading days
contagion_60d 0.0-1.0 Probability in 60 trading days

Model Checkpoints

File Description
outputs/models/StemGNN/chunk{N}/best_model.pt Best validation loss model
outputs/models/StemGNN/chunk{N}/latest_model.pt Most recent epoch model

XAI Outputs

File Description
outputs/results/StemGNN/xai/{chunk}_{split}_adjacency.npy Learned N×N adjacency matrix
outputs/results/StemGNN/xai/{chunk}_{split}_top_influencers.json Top-10 influencers per stock
outputs/results/StemGNN/xai/{chunk}_{split}_edge_importance.npy Gradient-based edge importance
outputs/results/StemGNN/xai/{chunk}_{split}_gnnexplainer.json GNNExplainer subgraph (if enabled)

HPO Results

File Description
outputs/codeResults/StemGNN/best_params_chunk{N}.json Best hyperparameters per chunk
outputs/codeResults/StemGNN/hpo.db Optuna study database (SQLite)

6. Model Specifications

Default Hyperparameters

Parameter Default Description
window_size 30 Trading days of lookback
multi_layer 13 Number of StemGNN spectral-temporal blocks
stack_cnt 2 Number of stacked StemGNN blocks
dropout_rate 0.75 Dropout for regularization
leaky_rate 0.2 LeakyReLU slope
contagion_horizons [5, 20, 60] Prediction horizons (trading days)
extreme_quantile 0.05 Bottom 5% of returns = extreme
excess_threshold_std 2.0 Standard deviations for excess negativity

Training Hyperparameters

Parameter Default Description
batch_size 8 Small due to 2,500×2,500 adjacency matrices
epochs 100 With early stopping (patience=20)
learning_rate 0.001 Initial learning rate
decay_rate 0.5 Exponential LR decay multiplier
exponential_decay_step 13 Decay LR every 13 epochs
gradient_clip 1.0 Max gradient norm
optimizer RMSProp From baseline paper
norm_method z_score Input normalization

Model Size

Component Parameters
StemGNN base (GRU + attention + spectral blocks) ~60,968,000
Contagion heads (3 × 2-layer MLP) ~61,773
Total ~61,029,773

Parameter Count

The model has 61 million parameters. Most are in the spectral-temporal blocks (the StockBlockLayer which contains GLU modules and convolution weights). The GRU in the latent correlation layer has window_size × units = 30 × 2500 = 75,000 parameters. The self-attention keys and queries add another ~6,250 each.


7. Training Protocol

Chronological Splits

The module uses the same 3-chunk chronological split as all other project modules:

Chunk Training Validation Testing Label
1 2000-2004 (1,255 days) 2005 (252 days) 2006 (251 days) chunk1
2 2007-2014 (2,014 days) 2015 (252 days) 2016 (252 days) chunk2
3 2017-2022 (1,510 days) 2023 (250 days) 2024 (249 days) chunk3

Training Loop

For each epoch:
  1. Forward pass: returns_matrix → latent_correlation → spectral blocks → contagion heads
  2. Loss: Binary Cross-Entropy between predicted probabilities and contagion targets
  3. Backward pass with gradient clipping (max_norm=1.0)
  4. RMSProp optimizer step
  5. Every exponential_decay_step epochs: multiply LR by decay_rate
  6. Validation: compute BCE on held-out validation period
  7. Save best model (lowest validation loss)
  8. Early stop if no improvement for 20 epochs

Anti-Leakage Rules

  1. Chronological splits: Training data always precedes validation/test data
  2. Normalization: Z-score statistics (mean, std) fitted on training data only, applied to validation/test
  3. Target construction: Contagion labels use only historical data up to time t (2-year rolling window)
  4. No future information: Forward returns are used ONLY as supervised targets, never as input features

8. Hyperparameter Optimization

Search Method

Optuna with TPE (Tree-structured Parzen Estimator) sampler.

Search Space

Parameter Type Range
window_size Categorical {15, 30, 60}
multi_layer Categorical {5, 8, 13, 20}
dropout_rate Categorical {0.5, 0.6, 0.75, 0.8}
learning_rate Log-uniform [1e-4, 1e-2]
decay_rate Categorical {0.3, 0.5, 0.7, 0.9}
exponential_decay_step Categorical {5, 8, 13}
batch_size Categorical {4, 8, 16}

HPO Configuration

Parameter Value
Trials per chunk 50
Startup trials (random) 10
HPO epochs per trial 20
HPO training windows 2,000 (subset for speed)
Pruner Median (n_startup_trials=5)
Storage SQLite (outputs/codeResults/StemGNN/hpo.db)

Objective

Minimize validation binary cross-entropy loss on the contagion prediction task.

Estimated HPO Runtime

Chunk Training Days Windows Est. Time per Trial Total (50 trials)
1 1,255 1,165 ~2-4 min ~2-3 hours
2 2,014 1,924 ~4-6 min ~4-5 hours
3 1,510 1,420 ~3-5 min ~3-4 hours

9. XAI Integration

Level 1: Learned Adjacency + Top Influencers (Always On)

What it provides:

How it works: The StemGNN’s latent_correlation_layer uses GRU + self-attention to learn which stocks influence each other. This adjacency matrix is extracted during the forward pass and averaged across all batches in the prediction set.

Output: {chunk}_{split}_adjacency.npy (2,500×2,500 matrix) + top_influencers.json

Example:

{
  "AAPL": [
    {"ticker": "MSFT", "weight": 0.89},
    {"ticker": "NVDA", "weight": 0.72},
    {"ticker": "QQQ", "weight": 0.68}
  ]
}

Level 2: Gradient-Based Edge Importance (Always On)

What it provides:

How it works: During prediction, we compute ∂(contagion_score) / ∂(input) — how much does each input value affect the contagion score? The gradient magnitude at each input position approximates the importance of that stock’s return for the prediction.

Output: {chunk}_{split}_edge_importance.npy

Level 3: GNNExplainer Subgraph Mask (Opt-in)

What it provides:

How it works: An optimization loop learns a mask over the adjacency matrix. The mask maximizes mutual information between the masked graph and the original prediction while minimizing the number of edges kept.

Enable with: --enable-gnnexplainer flag during predict command.

⚠️ Performance warning: GNNExplainer runs an optimization loop (50 iterations) per explained sample. It can add 1-2 minutes per sample. Use sparingly — for deep-dive explanations, not bulk prediction.

Output: {chunk}_{split}_gnnexplainer.json


10. GPU Optimizations

Applied Optimizations

Optimization What It Does Expected Speedup
Mixed Precision (AMP) Uses float16 for forward/backward, float32 for master weights 1.5-2× faster on RTX 3090 Ti
cudnn.benchmark=True Auto-tunes CUDA convolution algorithms for optimal performance 10-20% faster convolutions
pin_memory=True Allocates DataLoader output in page-locked memory for faster CPU→GPU transfer 20-30% faster data loading
prefetch_factor=4 Preloads 4 batches ahead, keeping GPU continuously fed Eliminates GPU idle time
num_workers=6 Uses 6 CPU threads for parallel data loading (12 threads available) 3-5× faster data loading
Vectorized targets Replaced per-stock Python loops with numpy vectorized operations 10-50× faster dataset construction
non_blocking=True Async CPU→GPU transfers overlap with GPU compute 5-10% end-to-end speedup
zero_grad(set_to_none=True) Sets gradients to None instead of zero (more efficient memory) 5% memory + speed improvement

Memory Management

Expected Training Throughput


11. Dependencies

Python Packages

Package Version Purpose
torch ≥2.2.0 Deep learning framework
numpy ≥1.24 Numerical operations
pandas ≥2.0 Data loading and manipulation
tqdm ≥4.65 Progress bars
optuna ≥3.0 Hyperparameter optimization (optional, for HPO only)

Internal Dependencies

File Purpose
code/gnn/stemgnn_base_model.py Baseline StemGNN Model class (imported)
code/gnn/stemgnn_forecast_dataloader.py Normalization utilities (imported)
data/yFinance/processed/returns_panel_wide.csv Input returns matrix

Baseline Code Attribution

The baseline StemGNN model (stemgnn_base_model.py) is adapted from the official implementation by Cao et al. (2020), available at github.com/microsoft/StemGNN. The data loader and math utilities are adapted from the same repository.


12. Usage

Quick Start

# 1. Verify data availability
python code/gnn/stemgnn_contagion.py inspect

# 2. CPU smoke test (verify training loop works)
python code/gnn/stemgnn_contagion.py train-best --chunk 1 --device cpu --max-train-windows 50

# 3. Full HPO + Training + Prediction pipeline
python code/gnn/stemgnn_contagion.py hpo --chunk 1 --trials 50 --device cuda && \
python code/gnn/stemgnn_contagion.py train-best --chunk 1 --device cuda && \
python code/gnn/stemgnn_contagion.py hpo --chunk 2 --trials 50 --device cuda && \
python code/gnn/stemgnn_contagion.py train-best --chunk 2 --device cuda && \
python code/gnn/stemgnn_contagion.py hpo --chunk 3 --trials 50 --device cuda && \
python code/gnn/stemgnn_contagion.py train-best --chunk 3 --device cuda

# 4. Generate predictions with XAI
python code/gnn/stemgnn_contagion.py predict --chunk 1 --split test --device cuda

CLI Commands

Command Description
inspect Validate input data and show chronological splits
hpo --chunk N Run hyperparameter optimization for chunk N
train-best --chunk N Train with best HPO params (or defaults if no HPO)
train-best-all Train all 3 chunks sequentially
predict --chunk N --split S Generate contagion scores + XAI for chunk N, split S

Common Options

Option Default Description
--device cuda cuda or cpu
--max-train-windows 0 (all) Limit windows for smoke testing
--repo-root . Project root directory
--trials 50 HPO trials per chunk

13. File Structure

fin-glassbox/
├── code/gnn/
│   ├── stemgnn_contagion.py           # ← THIS MODULE
│   ├── stemgnn_base_model.py          # Baseline StemGNN Model
│   ├── stemgnn_forecast_dataloader.py # Normalization utilities
│   ├── stemgnn_handler.py             # Original baseline train/validate
│   ├── stemgnn_utils.py               # Evaluation metrics
│   └── StemGNN.md              # ← THIS FILE
│
├── data/yFinance/processed/
│   └── returns_panel_wide.csv         # Input: 2,500 × 6,285 returns matrix
│
└── outputs/
    ├── models/StemGNN/
    │   ├── chunk1/best_model.pt
    │   ├── chunk2/best_model.pt
    │   └── chunk3/best_model.pt
    │
    ├── results/StemGNN/
    │   ├── contagion_scores_chunk1_test.csv
    │   └── xai/
    │       ├── chunk1_test_adjacency.npy
    │       ├── chunk1_test_top_influencers.json
    │       ├── chunk1_test_edge_importance.npy
    │       └── chunk1_test_gnnexplainer.json
    │
    └── codeResults/StemGNN/
        ├── best_params_chunk1.json
        └── hpo.db

14. Validation Metrics

Training Metrics

Metric Description Target
Binary Cross-Entropy Primary training loss Minimize
Validation BCE Held-out period loss Monitor for early stopping

Contagion Quality Metrics

Metric How to Measure Target
Crisis calibration During 2008, 2020 crashes: do high-score stocks drop more? High-score stocks should show larger drawdowns
Sector specificity Tech sector drop → Tech contagion scores spike, Utilities don’t Sector-specific, not just market beta
Predictive power High contagion score at time t → elevated probability of large negative return at t+1 Positive correlation
Stability Day-to-day score changes should be smooth, not erratic Rolling std < 0.1
Sparsity Learned adjacency should have ~50 meaningful edges per node (√2500) Average degree ≈ 50

Known Limitations

  1. Linear target thresholds: The 5% quantile and 2-std thresholds are fixed. Extreme market regimes may produce too many/few contagion signals.

  2. No fundamental data: Contagion is purely price-based. Does not account for supply-chain relationships, common ownership, or macroeconomic linkages.

  3. Static adjacency per batch: The adjacency matrix is learned per batch but averaged across the prediction set. Dynamic, time-varying adjacency would be more accurate but computationally prohibitive.

  4. Training cost: 61M parameters with 2,500×2,500 matrices requires GPU. CPU training is impractical (10-50× slower).


Document Version: 1.0
Last Updated: 26 April 2026