fin-glassbox

StemGNN Contagion Risk Module

Project: Explainable Multimodal Neural Framework for Financial Risk Management
Module: GNN Contagion Risk (StemGNN)
File: code/gnn/stemgnn_contagion.py

Overview
Architecture
Contagion Score Definition
Input Data
Output Data
Model Specifications
Training Protocol
Hyperparameter Optimization
XAI Integration
GPU Optimizations
Dependencies
Usage
File Structure
Validation Metrics

1. Overview

What It Does

The StemGNN Contagion Risk Module answers the question: “If other stocks crash, how badly will this stock get hurt?”

It uses a Spectral Temporal Graph Neural Network (StemGNN) to learn how financial distress propagates through the market. For each of 2,500 stocks, it outputs a contagion probability score (0-1) at three horizons: 5 days, 20 days, and 60 days.

Why StemGNN

StemGNN was chosen over simpler approaches (correlation matrices, linear models) because:

Problem	Why StemGNN Solves It
Non-linear contagion	Crash propagation is non-linear — StemGNN’s spectral convolutions capture this
Hidden relationships	Latent correlation layer learns connections that simple correlation misses
Multi-hop propagation	Supplier→MSFT→AAPL: 2-hop paths that correlation can’t see
Temporal dynamics	Contagion changes over time — Fourier transforms capture cyclical patterns
Joint learning	Learns graph structure AND temporal patterns simultaneously, not separately

Research Foundation

This module builds on two peer-reviewed papers:

Cao et al. (2020) — “Spectral Temporal Graph Neural Network for Multivariate Time-series Forecasting” (NeurIPS 2020). The original StemGNN architecture.
Uygun & Sefer (2025) — “Financial asset price prediction with graph neural network-based temporal deep learning models” (Neural Computing and Applications). Validated StemGNN on cryptocurrency and Forex markets, achieving 9.29% MAPE.

Our adaptation: Instead of predicting future prices, we predict contagion-driven extreme negative returns — a risk-focused reformulation.

2. Architecture

High-Level Architecture

returns_panel_wide.csv (2,500 stocks × 6,285 days)
        │
        ▼
┌──────────────────────────────────────┐
│         ContagionDataset             │
│  - Sliding windows (30 days)         │
│  - Builds binary contagion targets   │
│  - Vectorized (no per-stock loops)   │
└──────────────────────────────────────┘
        │
        ▼
┌──────────────────────────────────────┐
│       ContagionStemGNN               │
│  ┌────────────────────────────────┐  │
│  │  StemGNN Base (baseline code)  │  │
│  │  - Latent Correlation Layer    │  │
│  │  - Spectral-Temporal Blocks    │  │
│  │  - Graph Fourier Transform     │  │
│  │  - Discrete Fourier Transform  │  │
│  └────────────────────────────────┘  │
│  ┌────────────────────────────────┐  │
│  │  Contagion Heads (3 horizons)  │  │
│  │  - 5-day contagion probability │  │
│  │  - 20-day contagion probability│  │
│  │  - 60-day contagion probability│  │
│  └────────────────────────────────┘  │
└──────────────────────────────────────┘
        │
        ▼
┌──────────────────────────────────────┐
│         XAI Extraction               │
│  Level 1: Adjacency + Top Influencers│
│  Level 2: Gradient Edge Importance   │
│  Level 3: GNNExplainer (opt-in)      │
└──────────────────────────────────────┘

Baseline Model Internals (from Cao et al. 2020)

The StemGNN core operates in the spectral domain:

Latent Correlation Layer: Uses GRU + self-attention to learn an N×N adjacency matrix from the data. No predefined graph needed — it discovers which stocks influence each other.
Graph Fourier Transform (GFT): Projects the multivariate time series onto eigenvectors of the graph Laplacian. This decomposes the signal into orthogonal components — different “market modes.”
Discrete Fourier Transform (DFT): Transforms each component into the frequency domain. Captures cyclical patterns (weekly, monthly, quarterly).
Spectral-Temporal Blocks: 1D convolutions + Gated Linear Units (GLU) learn patterns in the frequency domain. The spectral representation is cleaner and easier to predict than raw time-domain data.
Inverse Transforms: IDFT → IGFT to return to the original space.
Stack of 2 Blocks: The second block learns residuals from the first, connected by skip connections.

Contagion-Specific Modifications

The baseline StemGNN predicts future price values. We replace its final fc layer with 3 independent contagion heads:

stemgnn_output: (batch, num_nodes, window_size)
        │
        ├──→ Head_5d: Linear(30→15) → LeakyReLU → Dropout → Linear(15→1) → Sigmoid
        ├──→ Head_20d: Linear(30→15) → LeakyReLU → Dropout → Linear(15→1) → Sigmoid
        └──→ Head_60d: Linear(30→15) → LeakyReLU → Dropout → Linear(15→1) → Sigmoid

Each head outputs a probability [0,1] that the stock will experience a contagion-driven extreme negative return at that horizon.

3. Contagion Score Definition

Mathematical Definition

For stock i at time t:

contagion_score_h[i] = P(
    return_i[t:t+h] < threshold_i  AND  |return_i - E[return_i]| > 2 × σ_i
)

Where:

threshold_i = 5th percentile of stock i’s historical 2-year returns (extreme negative)
E[return_i] = Expected return based on stock’s own 60-day history
σ_i = Standard deviation of stock’s 60-day returns

In plain English: A stock gets a high contagion score if:

It suffers an extreme negative return (bottom 5% of its own history), AND
This negative return is much worse than what its own recent behavior would predict

The second condition removes idiosyncratic drops (bad earnings → stock drops alone) from contagion drops (market selloff → stock gets dragged down).

Why Binary Targets?

Binary targets (contagion event: yes/no) are used because:

Position Sizing Engine needs clear risk flags, not continuous values
Binary classification is more robust to the extreme noise in financial returns
The sigmoid output naturally provides a well-calibrated probability

Multi-Horizon Design

Horizon	What It Captures	Trading Use
5 days	Immediate spillover risk	Short-term position adjustment
20 days	Medium-term contagion	Monthly rebalancing decisions
60 days	Systemic risk buildup	Strategic allocation changes

Longer horizons are inherently harder to predict but capture slower-building systemic risks (like the 2008 financial crisis, which unfolded over months).

4. Input Data

Primary Input

File	Description
`data/yFinance/processed/returns_panel_wide.csv`	Log returns matrix

Input Format

date,A,AA,AAL,...,ZTS
2000-01-04,0.012,-0.008,0.003,...,-0.005
2000-01-05,0.014,-0.006,0.001,...,0.002
...
2024-12-27,0.008,0.011,-0.002,...,0.007

Property	Value
Shape	6,285 rows × 2,501 columns (date + 2,500 tickers)
Tickers	2,500 U.S. stocks
Date range	2000-01-04 to 2024-12-27
Frequency	Daily (NYSE trading days only)
Return type	Log returns: ln(close_t / close_t-1)
NaN rate	0.0%
File size	~277 MB

Data Preprocessing

The ContagionDataset class performs these steps automatically:

Sliding windows: Extracts 30-day windows of returns for all 2,500 stocks
Target construction: Computes binary contagion labels per stock per horizon
Normalization: Z-score normalization fitted on training data only
Memory efficiency: Windows stored as numpy arrays, loaded on-demand by DataLoader

Memory Requirements

Component	Memory (approx)
Returns matrix (full)	~125 MB (float32)
Training windows (1,200 windows)	~720 MB
Model parameters (61M)	~244 MB
Adjacency matrix (2,500×2,500)	~50 MB (float32)
Total VRAM (training)	~3-4 GB
GPU required	RTX 3090 Ti (24GB) — well within limits

5. Output Data

Contagion Scores

File	Description
`outputs/results/StemGNN/contagion_scores_chunk{N}_{split}.csv`	Per-stock contagion probabilities

Format:

ticker,contagion_5d,contagion_20d,contagion_60d
AAPL,0.12,0.18,0.31
MSFT,0.09,0.15,0.28
...

Column	Range	Description
`ticker`	string	Stock symbol
`contagion_5d`	0.0-1.0	Probability of contagion-driven extreme loss in 5 trading days
`contagion_20d`	0.0-1.0	Probability in 20 trading days
`contagion_60d`	0.0-1.0	Probability in 60 trading days

Model Checkpoints

File	Description
`outputs/models/StemGNN/chunk{N}/best_model.pt`	Best validation loss model
`outputs/models/StemGNN/chunk{N}/latest_model.pt`	Most recent epoch model

XAI Outputs

File	Description
`outputs/results/StemGNN/xai/{chunk}_{split}_adjacency.npy`	Learned N×N adjacency matrix
`outputs/results/StemGNN/xai/{chunk}_{split}_top_influencers.json`	Top-10 influencers per stock
`outputs/results/StemGNN/xai/{chunk}_{split}_edge_importance.npy`	Gradient-based edge importance
`outputs/results/StemGNN/xai/{chunk}_{split}_gnnexplainer.json`	GNNExplainer subgraph (if enabled)

HPO Results

File	Description
`outputs/codeResults/StemGNN/best_params_chunk{N}.json`	Best hyperparameters per chunk
`outputs/codeResults/StemGNN/hpo.db`	Optuna study database (SQLite)

6. Model Specifications

Default Hyperparameters

Parameter	Default	Description
`window_size`	30	Trading days of lookback
`multi_layer`	13	Number of StemGNN spectral-temporal blocks
`stack_cnt`	2	Number of stacked StemGNN blocks
`dropout_rate`	0.75	Dropout for regularization
`leaky_rate`	0.2	LeakyReLU slope
`contagion_horizons`	[5, 20, 60]	Prediction horizons (trading days)
`extreme_quantile`	0.05	Bottom 5% of returns = extreme
`excess_threshold_std`	2.0	Standard deviations for excess negativity

Training Hyperparameters

Parameter	Default	Description
`batch_size`	8	Small due to 2,500×2,500 adjacency matrices
`epochs`	100	With early stopping (patience=20)
`learning_rate`	0.001	Initial learning rate
`decay_rate`	0.5	Exponential LR decay multiplier
`exponential_decay_step`	13	Decay LR every 13 epochs
`gradient_clip`	1.0	Max gradient norm
`optimizer`	RMSProp	From baseline paper
`norm_method`	z_score	Input normalization

Model Size

Component	Parameters
StemGNN base (GRU + attention + spectral blocks)	~60,968,000
Contagion heads (3 × 2-layer MLP)	~61,773
Total	~61,029,773

Parameter Count

The model has 61 million parameters. Most are in the spectral-temporal blocks (the StockBlockLayer which contains GLU modules and convolution weights). The GRU in the latent correlation layer has window_size × units = 30 × 2500 = 75,000 parameters. The self-attention keys and queries add another ~6,250 each.

7. Training Protocol

Chronological Splits

The module uses the same 3-chunk chronological split as all other project modules:

Chunk	Training	Validation	Testing	Label
1	2000-2004 (1,255 days)	2005 (252 days)	2006 (251 days)	chunk1
2	2007-2014 (2,014 days)	2015 (252 days)	2016 (252 days)	chunk2
3	2017-2022 (1,510 days)	2023 (250 days)	2024 (249 days)	chunk3

Training Loop

For each epoch:
Forward pass: returns_matrix → latent_correlation → spectral blocks → contagion heads
Loss: Binary Cross-Entropy between predicted probabilities and contagion targets
Backward pass with gradient clipping (max_norm=1.0)
RMSProp optimizer step
Every exponential_decay_step epochs: multiply LR by decay_rate
Validation: compute BCE on held-out validation period
Save best model (lowest validation loss)
Early stop if no improvement for 20 epochs

Anti-Leakage Rules

Chronological splits: Training data always precedes validation/test data
Normalization: Z-score statistics (mean, std) fitted on training data only, applied to validation/test
Target construction: Contagion labels use only historical data up to time t (2-year rolling window)
No future information: Forward returns are used ONLY as supervised targets, never as input features

8. Hyperparameter Optimization

Search Method

Optuna with TPE (Tree-structured Parzen Estimator) sampler.

Search Space

Parameter	Type	Range
`window_size`	Categorical	{15, 30, 60}
`multi_layer`	Categorical	{5, 8, 13, 20}
`dropout_rate`	Categorical	{0.5, 0.6, 0.75, 0.8}
`learning_rate`	Log-uniform	[1e-4, 1e-2]
`decay_rate`	Categorical	{0.3, 0.5, 0.7, 0.9}
`exponential_decay_step`	Categorical	{5, 8, 13}
`batch_size`	Categorical	{4, 8, 16}

HPO Configuration

Parameter	Value
Trials per chunk	50
Startup trials (random)	10
HPO epochs per trial	20
HPO training windows	2,000 (subset for speed)
Pruner	Median (n_startup_trials=5)
Storage	SQLite (`outputs/codeResults/StemGNN/hpo.db`)

Objective

Minimize validation binary cross-entropy loss on the contagion prediction task.

Estimated HPO Runtime

Chunk	Training Days	Windows	Est. Time per Trial	Total (50 trials)
1	1,255	1,165	~2-4 min	~2-3 hours
2	2,014	1,924	~4-6 min	~4-5 hours
3	1,510	1,420	~3-5 min	~3-4 hours

9. XAI Integration

Level 1: Learned Adjacency + Top Influencers (Always On)

What it provides:

The N×N adjacency matrix learned by the latent correlation layer
Top-10 most influential stocks for each stock (by attention weight)

How it works: The StemGNN’s latent_correlation_layer uses GRU + self-attention to learn which stocks influence each other. This adjacency matrix is extracted during the forward pass and averaged across all batches in the prediction set.

Output: {chunk}_{split}_adjacency.npy (2,500×2,500 matrix) + top_influencers.json

Example:

{
  "AAPL": [
    {"ticker": "MSFT", "weight": 0.89},
    {"ticker": "NVDA", "weight": 0.72},
    {"ticker": "QQQ", "weight": 0.68}
  ]
}

Level 2: Gradient-Based Edge Importance (Always On)

What it provides:

Per-edge importance scores computed via gradient backpropagation
Shows which connections most influenced the contagion prediction

How it works: During prediction, we compute ∂(contagion_score) / ∂(input) — how much does each input value affect the contagion score? The gradient magnitude at each input position approximates the importance of that stock’s return for the prediction.

Output: {chunk}_{split}_edge_importance.npy

Level 3: GNNExplainer Subgraph Mask (Opt-in)

What it provides:

A compact subgraph (subset of edges) that best explains a specific contagion prediction
Based on the GNNExplainer paper (Ying et al., NeurIPS 2019)

How it works: An optimization loop learns a mask over the adjacency matrix. The mask maximizes mutual information between the masked graph and the original prediction while minimizing the number of edges kept.

Enable with: --enable-gnnexplainer flag during predict command.

⚠️ Performance warning: GNNExplainer runs an optimization loop (50 iterations) per explained sample. It can add 1-2 minutes per sample. Use sparingly — for deep-dive explanations, not bulk prediction.

Output: {chunk}_{split}_gnnexplainer.json

10. GPU Optimizations

Applied Optimizations

Optimization	What It Does	Expected Speedup
Mixed Precision (AMP)	Uses float16 for forward/backward, float32 for master weights	1.5-2× faster on RTX 3090 Ti
cudnn.benchmark=True	Auto-tunes CUDA convolution algorithms for optimal performance	10-20% faster convolutions
pin_memory=True	Allocates DataLoader output in page-locked memory for faster CPU→GPU transfer	20-30% faster data loading
prefetch_factor=4	Preloads 4 batches ahead, keeping GPU continuously fed	Eliminates GPU idle time
num_workers=6	Uses 6 CPU threads for parallel data loading (12 threads available)	3-5× faster data loading
Vectorized targets	Replaced per-stock Python loops with numpy vectorized operations	10-50× faster dataset construction
non_blocking=True	Async CPU→GPU transfers overlap with GPU compute	5-10% end-to-end speedup
zero_grad(set_to_none=True)	Sets gradients to None instead of zero (more efficient memory)	5% memory + speed improvement

Memory Management

Batch size 8 keeps VRAM usage at ~3-4 GB (well within 24GB limit)
2,500×2,500 adjacency matrix = 25M entries × 4 bytes (float32) = 100 MB per matrix
With batch=8: 800 MB for activations
Ample headroom for gradient accumulation if needed

Expected Training Throughput

~15-30 seconds per epoch on 1,165 windows with batch_size=8
Mixed precision provides the largest single speedup
Full 100-epoch training: ~25-50 minutes per chunk (with early stopping typically cutting this to 30-40 epochs)

11. Dependencies

Python Packages

Package	Version	Purpose
`torch`	≥2.2.0	Deep learning framework
`numpy`	≥1.24	Numerical operations
`pandas`	≥2.0	Data loading and manipulation
`tqdm`	≥4.65	Progress bars
`optuna`	≥3.0	Hyperparameter optimization (optional, for HPO only)

Internal Dependencies

File	Purpose
`code/gnn/stemgnn_base_model.py`	Baseline StemGNN Model class (imported)
`code/gnn/stemgnn_forecast_dataloader.py`	Normalization utilities (imported)
`data/yFinance/processed/returns_panel_wide.csv`	Input returns matrix

Baseline Code Attribution

The baseline StemGNN model (stemgnn_base_model.py) is adapted from the official implementation by Cao et al. (2020), available at github.com/microsoft/StemGNN. The data loader and math utilities are adapted from the same repository.

12. Usage

Quick Start

# 1. Verify data availability
python code/gnn/stemgnn_contagion.py inspect

# 2. CPU smoke test (verify training loop works)
python code/gnn/stemgnn_contagion.py train-best --chunk 1 --device cpu --max-train-windows 50

# 3. Full HPO + Training + Prediction pipeline
python code/gnn/stemgnn_contagion.py hpo --chunk 1 --trials 50 --device cuda && \
python code/gnn/stemgnn_contagion.py train-best --chunk 1 --device cuda && \
python code/gnn/stemgnn_contagion.py hpo --chunk 2 --trials 50 --device cuda && \
python code/gnn/stemgnn_contagion.py train-best --chunk 2 --device cuda && \
python code/gnn/stemgnn_contagion.py hpo --chunk 3 --trials 50 --device cuda && \
python code/gnn/stemgnn_contagion.py train-best --chunk 3 --device cuda

# 4. Generate predictions with XAI
python code/gnn/stemgnn_contagion.py predict --chunk 1 --split test --device cuda

CLI Commands

Command	Description
`inspect`	Validate input data and show chronological splits
`hpo --chunk N`	Run hyperparameter optimization for chunk N
`train-best --chunk N`	Train with best HPO params (or defaults if no HPO)
`train-best-all`	Train all 3 chunks sequentially
`predict --chunk N --split S`	Generate contagion scores + XAI for chunk N, split S

Common Options

Option	Default	Description
`--device`	cuda	`cuda` or `cpu`
`--max-train-windows`	0 (all)	Limit windows for smoke testing
`--repo-root`	`.`	Project root directory
`--trials`	50	HPO trials per chunk

13. File Structure

fin-glassbox/
├── code/gnn/
│   ├── stemgnn_contagion.py           # ← THIS MODULE
│   ├── stemgnn_base_model.py          # Baseline StemGNN Model
│   ├── stemgnn_forecast_dataloader.py # Normalization utilities
│   ├── stemgnn_handler.py             # Original baseline train/validate
│   ├── stemgnn_utils.py               # Evaluation metrics
│   └── StemGNN.md              # ← THIS FILE
│
├── data/yFinance/processed/
│   └── returns_panel_wide.csv         # Input: 2,500 × 6,285 returns matrix
│
└── outputs/
    ├── models/StemGNN/
    │   ├── chunk1/best_model.pt
    │   ├── chunk2/best_model.pt
    │   └── chunk3/best_model.pt
    │
    ├── results/StemGNN/
    │   ├── contagion_scores_chunk1_test.csv
    │   └── xai/
    │       ├── chunk1_test_adjacency.npy
    │       ├── chunk1_test_top_influencers.json
    │       ├── chunk1_test_edge_importance.npy
    │       └── chunk1_test_gnnexplainer.json
    │
    └── codeResults/StemGNN/
        ├── best_params_chunk1.json
        └── hpo.db

14. Validation Metrics

Training Metrics

Metric	Description	Target
Binary Cross-Entropy	Primary training loss	Minimize
Validation BCE	Held-out period loss	Monitor for early stopping

Contagion Quality Metrics

Metric	How to Measure	Target
Crisis calibration	During 2008, 2020 crashes: do high-score stocks drop more?	High-score stocks should show larger drawdowns
Sector specificity	Tech sector drop → Tech contagion scores spike, Utilities don’t	Sector-specific, not just market beta
Predictive power	High contagion score at time t → elevated probability of large negative return at t+1	Positive correlation
Stability	Day-to-day score changes should be smooth, not erratic	Rolling std < 0.1
Sparsity	Learned adjacency should have ~50 meaningful edges per node (√2500)	Average degree ≈ 50

Known Limitations

Linear target thresholds: The 5% quantile and 2-std thresholds are fixed. Extreme market regimes may produce too many/few contagion signals.
No fundamental data: Contagion is purely price-based. Does not account for supply-chain relationships, common ownership, or macroeconomic linkages.
Static adjacency per batch: The adjacency matrix is learned per batch but averaged across the prediction set. Dynamic, time-varying adjacency would be more accurate but computationally prohibitive.
Training cost: 61M parameters with 2,500×2,500 matrices requires GPU. CPU training is impractical (10-50× slower).

Document Version: 1.0
Last Updated: 26 April 2026

This site is open source. Improve this page.

fin-glassbox

StemGNN Contagion Risk Module

Table of Contents

1. Overview

What It Does

Why StemGNN

Research Foundation

2. Architecture

High-Level Architecture

Baseline Model Internals (from Cao et al. 2020)

Contagion-Specific Modifications

3. Contagion Score Definition

Mathematical Definition

Why Binary Targets?

Multi-Horizon Design

4. Input Data

Primary Input

Input Format

Data Preprocessing

Memory Requirements

5. Output Data

Contagion Scores

Model Checkpoints

XAI Outputs

HPO Results

6. Model Specifications

Default Hyperparameters

Training Hyperparameters

Model Size

Parameter Count

7. Training Protocol

Chronological Splits

Training Loop

Anti-Leakage Rules

8. Hyperparameter Optimization

Search Method

Search Space

HPO Configuration

Objective

Estimated HPO Runtime

9. XAI Integration

Level 1: Learned Adjacency + Top Influencers (Always On)

Level 2: Gradient-Based Edge Importance (Always On)

Level 3: GNNExplainer Subgraph Mask (Opt-in)

10. GPU Optimizations

Applied Optimizations

Memory Management

Expected Training Throughput

11. Dependencies

Python Packages

Internal Dependencies

Baseline Code Attribution

12. Usage

Quick Start

CLI Commands

Common Options

13. File Structure

14. Validation Metrics

Training Metrics

Contagion Quality Metrics

Known Limitations