Project: Explainable Multimodal Neural Framework for Financial Risk Management
Module: GNN Contagion Risk (StemGNN)
File: code/gnn/stemgnn_contagion.py
The StemGNN Contagion Risk Module answers the question: “If other stocks crash, how badly will this stock get hurt?”
It uses a Spectral Temporal Graph Neural Network (StemGNN) to learn how financial distress propagates through the market. For each of 2,500 stocks, it outputs a contagion probability score (0-1) at three horizons: 5 days, 20 days, and 60 days.
StemGNN was chosen over simpler approaches (correlation matrices, linear models) because:
| Problem | Why StemGNN Solves It |
|---|---|
| Non-linear contagion | Crash propagation is non-linear — StemGNN’s spectral convolutions capture this |
| Hidden relationships | Latent correlation layer learns connections that simple correlation misses |
| Multi-hop propagation | Supplier→MSFT→AAPL: 2-hop paths that correlation can’t see |
| Temporal dynamics | Contagion changes over time — Fourier transforms capture cyclical patterns |
| Joint learning | Learns graph structure AND temporal patterns simultaneously, not separately |
This module builds on two peer-reviewed papers:
Cao et al. (2020) — “Spectral Temporal Graph Neural Network for Multivariate Time-series Forecasting” (NeurIPS 2020). The original StemGNN architecture.
Uygun & Sefer (2025) — “Financial asset price prediction with graph neural network-based temporal deep learning models” (Neural Computing and Applications). Validated StemGNN on cryptocurrency and Forex markets, achieving 9.29% MAPE.
Our adaptation: Instead of predicting future prices, we predict contagion-driven extreme negative returns — a risk-focused reformulation.
returns_panel_wide.csv (2,500 stocks × 6,285 days)
│
▼
┌──────────────────────────────────────┐
│ ContagionDataset │
│ - Sliding windows (30 days) │
│ - Builds binary contagion targets │
│ - Vectorized (no per-stock loops) │
└──────────────────────────────────────┘
│
▼
┌──────────────────────────────────────┐
│ ContagionStemGNN │
│ ┌────────────────────────────────┐ │
│ │ StemGNN Base (baseline code) │ │
│ │ - Latent Correlation Layer │ │
│ │ - Spectral-Temporal Blocks │ │
│ │ - Graph Fourier Transform │ │
│ │ - Discrete Fourier Transform │ │
│ └────────────────────────────────┘ │
│ ┌────────────────────────────────┐ │
│ │ Contagion Heads (3 horizons) │ │
│ │ - 5-day contagion probability │ │
│ │ - 20-day contagion probability│ │
│ │ - 60-day contagion probability│ │
│ └────────────────────────────────┘ │
└──────────────────────────────────────┘
│
▼
┌──────────────────────────────────────┐
│ XAI Extraction │
│ Level 1: Adjacency + Top Influencers│
│ Level 2: Gradient Edge Importance │
│ Level 3: GNNExplainer (opt-in) │
└──────────────────────────────────────┘
The StemGNN core operates in the spectral domain:
Latent Correlation Layer: Uses GRU + self-attention to learn an N×N adjacency matrix from the data. No predefined graph needed — it discovers which stocks influence each other.
Graph Fourier Transform (GFT): Projects the multivariate time series onto eigenvectors of the graph Laplacian. This decomposes the signal into orthogonal components — different “market modes.”
Discrete Fourier Transform (DFT): Transforms each component into the frequency domain. Captures cyclical patterns (weekly, monthly, quarterly).
Spectral-Temporal Blocks: 1D convolutions + Gated Linear Units (GLU) learn patterns in the frequency domain. The spectral representation is cleaner and easier to predict than raw time-domain data.
Inverse Transforms: IDFT → IGFT to return to the original space.
Stack of 2 Blocks: The second block learns residuals from the first, connected by skip connections.
The baseline StemGNN predicts future price values. We replace its final fc layer with 3 independent contagion heads:
stemgnn_output: (batch, num_nodes, window_size)
│
├──→ Head_5d: Linear(30→15) → LeakyReLU → Dropout → Linear(15→1) → Sigmoid
├──→ Head_20d: Linear(30→15) → LeakyReLU → Dropout → Linear(15→1) → Sigmoid
└──→ Head_60d: Linear(30→15) → LeakyReLU → Dropout → Linear(15→1) → Sigmoid
Each head outputs a probability [0,1] that the stock will experience a contagion-driven extreme negative return at that horizon.
For stock i at time t:
contagion_score_h[i] = P(
return_i[t:t+h] < threshold_i AND |return_i - E[return_i]| > 2 × σ_i
)
Where:
threshold_i = 5th percentile of stock i’s historical 2-year returns (extreme negative)E[return_i] = Expected return based on stock’s own 60-day historyσ_i = Standard deviation of stock’s 60-day returnsIn plain English: A stock gets a high contagion score if:
The second condition removes idiosyncratic drops (bad earnings → stock drops alone) from contagion drops (market selloff → stock gets dragged down).
Binary targets (contagion event: yes/no) are used because:
| Horizon | What It Captures | Trading Use |
|---|---|---|
| 5 days | Immediate spillover risk | Short-term position adjustment |
| 20 days | Medium-term contagion | Monthly rebalancing decisions |
| 60 days | Systemic risk buildup | Strategic allocation changes |
Longer horizons are inherently harder to predict but capture slower-building systemic risks (like the 2008 financial crisis, which unfolded over months).
| File | Description |
|---|---|
data/yFinance/processed/returns_panel_wide.csv |
Log returns matrix |
date,A,AA,AAL,...,ZTS
2000-01-04,0.012,-0.008,0.003,...,-0.005
2000-01-05,0.014,-0.006,0.001,...,0.002
...
2024-12-27,0.008,0.011,-0.002,...,0.007
| Property | Value |
|---|---|
| Shape | 6,285 rows × 2,501 columns (date + 2,500 tickers) |
| Tickers | 2,500 U.S. stocks |
| Date range | 2000-01-04 to 2024-12-27 |
| Frequency | Daily (NYSE trading days only) |
| Return type | Log returns: ln(close_t / close_t-1) |
| NaN rate | 0.0% |
| File size | ~277 MB |
The ContagionDataset class performs these steps automatically:
| Component | Memory (approx) |
|---|---|
| Returns matrix (full) | ~125 MB (float32) |
| Training windows (1,200 windows) | ~720 MB |
| Model parameters (61M) | ~244 MB |
| Adjacency matrix (2,500×2,500) | ~50 MB (float32) |
| Total VRAM (training) | ~3-4 GB |
| GPU required | RTX 3090 Ti (24GB) — well within limits |
| File | Description |
|---|---|
outputs/results/StemGNN/contagion_scores_chunk{N}_{split}.csv |
Per-stock contagion probabilities |
Format:
ticker,contagion_5d,contagion_20d,contagion_60d
AAPL,0.12,0.18,0.31
MSFT,0.09,0.15,0.28
...
| Column | Range | Description |
|---|---|---|
ticker |
string | Stock symbol |
contagion_5d |
0.0-1.0 | Probability of contagion-driven extreme loss in 5 trading days |
contagion_20d |
0.0-1.0 | Probability in 20 trading days |
contagion_60d |
0.0-1.0 | Probability in 60 trading days |
| File | Description |
|---|---|
outputs/models/StemGNN/chunk{N}/best_model.pt |
Best validation loss model |
outputs/models/StemGNN/chunk{N}/latest_model.pt |
Most recent epoch model |
| File | Description |
|---|---|
outputs/results/StemGNN/xai/{chunk}_{split}_adjacency.npy |
Learned N×N adjacency matrix |
outputs/results/StemGNN/xai/{chunk}_{split}_top_influencers.json |
Top-10 influencers per stock |
outputs/results/StemGNN/xai/{chunk}_{split}_edge_importance.npy |
Gradient-based edge importance |
outputs/results/StemGNN/xai/{chunk}_{split}_gnnexplainer.json |
GNNExplainer subgraph (if enabled) |
| File | Description |
|---|---|
outputs/codeResults/StemGNN/best_params_chunk{N}.json |
Best hyperparameters per chunk |
outputs/codeResults/StemGNN/hpo.db |
Optuna study database (SQLite) |
| Parameter | Default | Description |
|---|---|---|
window_size |
30 | Trading days of lookback |
multi_layer |
13 | Number of StemGNN spectral-temporal blocks |
stack_cnt |
2 | Number of stacked StemGNN blocks |
dropout_rate |
0.75 | Dropout for regularization |
leaky_rate |
0.2 | LeakyReLU slope |
contagion_horizons |
[5, 20, 60] | Prediction horizons (trading days) |
extreme_quantile |
0.05 | Bottom 5% of returns = extreme |
excess_threshold_std |
2.0 | Standard deviations for excess negativity |
| Parameter | Default | Description |
|---|---|---|
batch_size |
8 | Small due to 2,500×2,500 adjacency matrices |
epochs |
100 | With early stopping (patience=20) |
learning_rate |
0.001 | Initial learning rate |
decay_rate |
0.5 | Exponential LR decay multiplier |
exponential_decay_step |
13 | Decay LR every 13 epochs |
gradient_clip |
1.0 | Max gradient norm |
optimizer |
RMSProp | From baseline paper |
norm_method |
z_score | Input normalization |
| Component | Parameters |
|---|---|
| StemGNN base (GRU + attention + spectral blocks) | ~60,968,000 |
| Contagion heads (3 × 2-layer MLP) | ~61,773 |
| Total | ~61,029,773 |
The model has 61 million parameters. Most are in the spectral-temporal blocks (the StockBlockLayer which contains GLU modules and convolution weights). The GRU in the latent correlation layer has window_size × units = 30 × 2500 = 75,000 parameters. The self-attention keys and queries add another ~6,250 each.
The module uses the same 3-chunk chronological split as all other project modules:
| Chunk | Training | Validation | Testing | Label |
|---|---|---|---|---|
| 1 | 2000-2004 (1,255 days) | 2005 (252 days) | 2006 (251 days) | chunk1 |
| 2 | 2007-2014 (2,014 days) | 2015 (252 days) | 2016 (252 days) | chunk2 |
| 3 | 2017-2022 (1,510 days) | 2023 (250 days) | 2024 (249 days) | chunk3 |
For each epoch:
1. Forward pass: returns_matrix → latent_correlation → spectral blocks → contagion heads
2. Loss: Binary Cross-Entropy between predicted probabilities and contagion targets
3. Backward pass with gradient clipping (max_norm=1.0)
4. RMSProp optimizer step
5. Every exponential_decay_step epochs: multiply LR by decay_rate
6. Validation: compute BCE on held-out validation period
7. Save best model (lowest validation loss)
8. Early stop if no improvement for 20 epochs
Optuna with TPE (Tree-structured Parzen Estimator) sampler.
| Parameter | Type | Range |
|---|---|---|
window_size |
Categorical | {15, 30, 60} |
multi_layer |
Categorical | {5, 8, 13, 20} |
dropout_rate |
Categorical | {0.5, 0.6, 0.75, 0.8} |
learning_rate |
Log-uniform | [1e-4, 1e-2] |
decay_rate |
Categorical | {0.3, 0.5, 0.7, 0.9} |
exponential_decay_step |
Categorical | {5, 8, 13} |
batch_size |
Categorical | {4, 8, 16} |
| Parameter | Value |
|---|---|
| Trials per chunk | 50 |
| Startup trials (random) | 10 |
| HPO epochs per trial | 20 |
| HPO training windows | 2,000 (subset for speed) |
| Pruner | Median (n_startup_trials=5) |
| Storage | SQLite (outputs/codeResults/StemGNN/hpo.db) |
Minimize validation binary cross-entropy loss on the contagion prediction task.
| Chunk | Training Days | Windows | Est. Time per Trial | Total (50 trials) |
|---|---|---|---|---|
| 1 | 1,255 | 1,165 | ~2-4 min | ~2-3 hours |
| 2 | 2,014 | 1,924 | ~4-6 min | ~4-5 hours |
| 3 | 1,510 | 1,420 | ~3-5 min | ~3-4 hours |
What it provides:
How it works:
The StemGNN’s latent_correlation_layer uses GRU + self-attention to learn which stocks influence each other. This adjacency matrix is extracted during the forward pass and averaged across all batches in the prediction set.
Output: {chunk}_{split}_adjacency.npy (2,500×2,500 matrix) + top_influencers.json
Example:
{
"AAPL": [
{"ticker": "MSFT", "weight": 0.89},
{"ticker": "NVDA", "weight": 0.72},
{"ticker": "QQQ", "weight": 0.68}
]
}
What it provides:
How it works:
During prediction, we compute ∂(contagion_score) / ∂(input) — how much does each input value affect the contagion score? The gradient magnitude at each input position approximates the importance of that stock’s return for the prediction.
Output: {chunk}_{split}_edge_importance.npy
What it provides:
How it works: An optimization loop learns a mask over the adjacency matrix. The mask maximizes mutual information between the masked graph and the original prediction while minimizing the number of edges kept.
Enable with: --enable-gnnexplainer flag during predict command.
⚠️ Performance warning: GNNExplainer runs an optimization loop (50 iterations) per explained sample. It can add 1-2 minutes per sample. Use sparingly — for deep-dive explanations, not bulk prediction.
Output: {chunk}_{split}_gnnexplainer.json
| Optimization | What It Does | Expected Speedup |
|---|---|---|
| Mixed Precision (AMP) | Uses float16 for forward/backward, float32 for master weights | 1.5-2× faster on RTX 3090 Ti |
| cudnn.benchmark=True | Auto-tunes CUDA convolution algorithms for optimal performance | 10-20% faster convolutions |
| pin_memory=True | Allocates DataLoader output in page-locked memory for faster CPU→GPU transfer | 20-30% faster data loading |
| prefetch_factor=4 | Preloads 4 batches ahead, keeping GPU continuously fed | Eliminates GPU idle time |
| num_workers=6 | Uses 6 CPU threads for parallel data loading (12 threads available) | 3-5× faster data loading |
| Vectorized targets | Replaced per-stock Python loops with numpy vectorized operations | 10-50× faster dataset construction |
| non_blocking=True | Async CPU→GPU transfers overlap with GPU compute | 5-10% end-to-end speedup |
| zero_grad(set_to_none=True) | Sets gradients to None instead of zero (more efficient memory) | 5% memory + speed improvement |
| Package | Version | Purpose |
|---|---|---|
torch |
≥2.2.0 | Deep learning framework |
numpy |
≥1.24 | Numerical operations |
pandas |
≥2.0 | Data loading and manipulation |
tqdm |
≥4.65 | Progress bars |
optuna |
≥3.0 | Hyperparameter optimization (optional, for HPO only) |
| File | Purpose |
|---|---|
code/gnn/stemgnn_base_model.py |
Baseline StemGNN Model class (imported) |
code/gnn/stemgnn_forecast_dataloader.py |
Normalization utilities (imported) |
data/yFinance/processed/returns_panel_wide.csv |
Input returns matrix |
The baseline StemGNN model (stemgnn_base_model.py) is adapted from the official implementation by Cao et al. (2020), available at github.com/microsoft/StemGNN. The data loader and math utilities are adapted from the same repository.
# 1. Verify data availability
python code/gnn/stemgnn_contagion.py inspect
# 2. CPU smoke test (verify training loop works)
python code/gnn/stemgnn_contagion.py train-best --chunk 1 --device cpu --max-train-windows 50
# 3. Full HPO + Training + Prediction pipeline
python code/gnn/stemgnn_contagion.py hpo --chunk 1 --trials 50 --device cuda && \
python code/gnn/stemgnn_contagion.py train-best --chunk 1 --device cuda && \
python code/gnn/stemgnn_contagion.py hpo --chunk 2 --trials 50 --device cuda && \
python code/gnn/stemgnn_contagion.py train-best --chunk 2 --device cuda && \
python code/gnn/stemgnn_contagion.py hpo --chunk 3 --trials 50 --device cuda && \
python code/gnn/stemgnn_contagion.py train-best --chunk 3 --device cuda
# 4. Generate predictions with XAI
python code/gnn/stemgnn_contagion.py predict --chunk 1 --split test --device cuda
| Command | Description |
|---|---|
inspect |
Validate input data and show chronological splits |
hpo --chunk N |
Run hyperparameter optimization for chunk N |
train-best --chunk N |
Train with best HPO params (or defaults if no HPO) |
train-best-all |
Train all 3 chunks sequentially |
predict --chunk N --split S |
Generate contagion scores + XAI for chunk N, split S |
| Option | Default | Description |
|---|---|---|
--device |
cuda | cuda or cpu |
--max-train-windows |
0 (all) | Limit windows for smoke testing |
--repo-root |
. |
Project root directory |
--trials |
50 | HPO trials per chunk |
fin-glassbox/
├── code/gnn/
│ ├── stemgnn_contagion.py # ← THIS MODULE
│ ├── stemgnn_base_model.py # Baseline StemGNN Model
│ ├── stemgnn_forecast_dataloader.py # Normalization utilities
│ ├── stemgnn_handler.py # Original baseline train/validate
│ ├── stemgnn_utils.py # Evaluation metrics
│ └── StemGNN.md # ← THIS FILE
│
├── data/yFinance/processed/
│ └── returns_panel_wide.csv # Input: 2,500 × 6,285 returns matrix
│
└── outputs/
├── models/StemGNN/
│ ├── chunk1/best_model.pt
│ ├── chunk2/best_model.pt
│ └── chunk3/best_model.pt
│
├── results/StemGNN/
│ ├── contagion_scores_chunk1_test.csv
│ └── xai/
│ ├── chunk1_test_adjacency.npy
│ ├── chunk1_test_top_influencers.json
│ ├── chunk1_test_edge_importance.npy
│ └── chunk1_test_gnnexplainer.json
│
└── codeResults/StemGNN/
├── best_params_chunk1.json
└── hpo.db
| Metric | Description | Target |
|---|---|---|
| Binary Cross-Entropy | Primary training loss | Minimize |
| Validation BCE | Held-out period loss | Monitor for early stopping |
| Metric | How to Measure | Target |
|---|---|---|
| Crisis calibration | During 2008, 2020 crashes: do high-score stocks drop more? | High-score stocks should show larger drawdowns |
| Sector specificity | Tech sector drop → Tech contagion scores spike, Utilities don’t | Sector-specific, not just market beta |
| Predictive power | High contagion score at time t → elevated probability of large negative return at t+1 | Positive correlation |
| Stability | Day-to-day score changes should be smooth, not erratic | Rolling std < 0.1 |
| Sparsity | Learned adjacency should have ~50 meaningful edges per node (√2500) | Average degree ≈ 50 |
Linear target thresholds: The 5% quantile and 2-std thresholds are fixed. Extreme market regimes may produce too many/few contagion signals.
No fundamental data: Contagion is purely price-based. Does not account for supply-chain relationships, common ownership, or macroeconomic linkages.
Static adjacency per batch: The adjacency matrix is learned per batch but averaged across the prediction set. Dynamic, time-varying adjacency would be more accurate but computationally prohibitive.
Training cost: 61M parameters with 2,500×2,500 matrices requires GPU. CPU training is impractical (10-50× slower).
Document Version: 1.0
Last Updated: 26 April 2026