Hyperparameter Configuration
Document Purpose
This document contains all hyperparameter configurations for every model in the Explainable Multimodal Neural Framework for Financial Risk Management. These values are based on:
- Baseline reproduction results (MTGNN, StemGNN, FourierGNN)
- Industry standards for financial time series
- Anti-overfitting requirements discussed and approved
- Computational constraints (TPE Bayesian optimization, limited GPU resources)
Version: 1.0
Status: Ready for Implementation
Table of Contents
- Global Training Configuration
- Encoder Layer
- Analyst Layer
- Risk Engine
- Fusion Layer
- Hyperparameter Search Spaces
- Learning Rate Schedules
- Regularization Summary
- YAML Configuration File
1. Global Training Configuration
| Parameter |
Value |
Description |
device |
"cuda" if available else "cpu" |
Compute device |
seed |
42 |
Random seed for reproducibility |
dtype |
float32 |
Default tensor type |
num_workers |
4 |
DataLoader workers |
pin_memory |
True |
Faster GPU transfer |
checkpoint_dir |
"./checkpoints/" |
Model save location |
log_dir |
"./logs/" |
TensorBoard logs |
mixed_precision |
True |
FP16 training for speed |
Training Chunks (Chronological)
training_chunks:
chunk_1:
train_years: [2000, 2001, 2002, 2003, 2004]
val_years: [2005]
test_years: [2006]
description: "Dot-com recovery period"
chunk_2:
train_years: [2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014]
val_years: [2015]
test_years: [2016]
description: "Financial crisis + recovery"
chunk_3:
train_years: [2017, 2018, 2019, 2020, 2021, 2022]
val_years: [2023]
test_years: [2024]
description: "COVID + bull market"
2. Encoder Layer
2A. Shared Temporal Attention Encoder
| Parameter |
Value |
Search Range (TPE) |
Notes |
d_model |
128 |
[64, 128, 256] |
Embedding dimension |
n_layers |
4 |
[2, 3, 4, 5, 6] |
Transformer layers |
n_heads |
4 |
[2, 4, 8] |
Attention heads |
d_ff |
512 |
4 × d_model |
Feed-forward dimension |
dropout |
0.1 |
[0.05, 0.1, 0.15, 0.2] |
General dropout |
attention_dropout |
0.1 |
[0.05, 0.1, 0.15] |
Attention-specific dropout |
activation |
"gelu" |
- |
GELU (standard for transformers) |
max_seq_len |
90 |
- |
Maximum lookback days |
batch_size |
32 |
[16, 32, 64] |
Per GPU |
epochs |
100 |
[50, 75, 100, 150] |
With early stopping |
learning_rate |
1e-4 |
loguniform(5e-5, 5e-4) |
Peak LR after warmup |
weight_decay |
1e-5 |
loguniform(5e-6, 5e-5) |
L2 regularization |
warmup_steps |
4000 |
[2000, 4000, 6000] |
Linear warmup |
lr_schedule |
"cosine" |
- |
Cosine decay with warmup |
gradient_clip |
1.0 |
- |
Clip gradient norm |
label_smoothing |
0.05 |
[0.0, 0.05, 0.1] |
Soften labels |
early_stop_patience |
20 |
[10, 15, 20, 25] |
Epochs without improvement |
optimizer |
"AdamW" |
- |
Adam with decoupled weight decay |
TPE Trials: 50-100
2B. FinBERT Financial Text Encoder
| Parameter |
Value |
Search Range |
Notes |
base_model |
"ProsusAI/finbert" |
- |
Pre-trained FinBERT |
max_length |
512 |
- |
Max tokens per document |
projection_dim |
256 |
[128, 256, 384] |
Output embedding size |
freeze_base |
False |
- |
Fine-tune entire model |
batch_size |
16 |
[8, 16, 32] |
Small due to memory |
epochs_per_chunk |
3 |
[2, 3, 4, 5] |
Fine-tuning epochs |
learning_rate |
2e-5 |
loguniform(1e-5, 5e-5) |
Standard BERT fine-tune LR |
weight_decay |
0.01 |
loguniform(0.001, 0.1) |
L2 regularization |
warmup_proportion |
0.1 |
[0.05, 0.1, 0.15] |
% of steps for warmup |
gradient_clip |
1.0 |
- |
Clip gradient norm |
dropout |
0.1 |
[0.05, 0.1, 0.15] |
Classifier dropout |
early_stop_patience |
5 |
[3, 5, 7] |
Epochs without improvement |
optimizer |
"AdamW" |
- |
Adam with decoupled weight decay |
scheduler |
"linear" |
- |
Linear decay with warmup |
TPE Trials: 20-30
3. Analyst Layer
3A. Technical Analyst (BiLSTM)
| Parameter |
Value |
Search Range (TPE) |
Notes |
input_dim |
128 |
- |
Temporal embedding size |
hidden_dim |
64 |
[32, 64, 128] |
LSTM hidden size |
num_layers |
1 |
[1, 2] |
BiLSTM layers |
bidirectional |
True |
- |
Use both directions |
dropout |
0.3 |
[0.2, 0.3, 0.4] |
LSTM dropout |
use_attention_pooling |
True |
- |
Weighted sequence pooling |
output_dim |
3 |
- |
Trend, momentum, timing |
batch_size |
64 |
[32, 64, 128] |
Per GPU |
epochs |
50 |
[30, 50, 75] |
With early stopping |
learning_rate |
1e-3 |
loguniform(5e-4, 5e-3) |
Adam LR |
weight_decay |
1e-4 |
loguniform(5e-5, 5e-4) |
L2 regularization |
gradient_clip |
1.0 |
- |
Clip gradient norm |
early_stop_patience |
20 |
[15, 20, 25] |
Epochs without improvement |
optimizer |
"Adam" |
- |
Adam optimizer |
scheduler |
"reduce_on_plateau" |
- |
Reduce LR when loss stalls |
TPE Trials: 30-50
3B. Sentiment Analyst (MLP)
| Parameter |
Value |
Search Range |
Notes |
input_dim |
256 |
- |
Text embedding size |
hidden_dims |
[128, 64] |
[[128], [128, 64], [256, 128, 64]] |
Layer sizes |
output_dim |
2 |
- |
Polarity, confidence |
dropout |
0.2 |
[0.1, 0.2, 0.3] |
Regularization |
activation |
"relu" |
- |
ReLU activation |
batch_norm |
True |
- |
Batch normalization |
batch_size |
128 |
[64, 128, 256] |
Per GPU |
epochs |
30 |
[20, 30, 50] |
With early stopping |
learning_rate |
1e-3 |
loguniform(5e-4, 5e-3) |
Adam LR |
weight_decay |
1e-5 |
loguniform(5e-6, 5e-5) |
L2 regularization |
label_smoothing |
0.05 |
[0.0, 0.05, 0.1] |
Soften labels |
early_stop_patience |
15 |
[10, 15, 20] |
Epochs without improvement |
optimizer |
"Adam" |
- |
Adam optimizer |
Grid Search: 9 combinations → TPE Fine-tuning: 20 trials
3C. News Analyst (Multi-Head Attention Pooling)
| Parameter |
Value |
Search Range (TPE) |
Notes |
input_dim |
256 |
- |
Per-document embedding size |
num_heads |
4 |
[2, 4, 8] |
Attention heads |
head_dim |
64 |
- |
Dimension per head |
dropout |
0.1 |
[0.05, 0.1, 0.15] |
General dropout |
attention_dropout |
0.1 |
[0.05, 0.1, 0.15] |
Attention dropout |
output_dim |
2 |
- |
Impact score, relevance |
batch_size |
64 |
[32, 64, 128] |
Per GPU (documents vary) |
epochs |
40 |
[30, 40, 50] |
With early stopping |
learning_rate |
1e-3 |
loguniform(5e-4, 5e-3) |
Adam LR |
weight_decay |
1e-5 |
loguniform(5e-6, 5e-5) |
L2 regularization |
gradient_clip |
1.0 |
- |
Clip gradient norm |
early_stop_patience |
15 |
[10, 15, 20] |
Epochs without improvement |
optimizer |
"Adam" |
- |
Adam optimizer |
TPE Trials: 20-30
4. Risk Engine
4A. Volatility Estimation (GARCH + MLP Hybrid)
GARCH Component
| Parameter |
Value |
Notes |
p |
1 |
ARCH order |
q |
1 |
GARCH order |
dist |
"normal" |
Error distribution |
rolling_window |
252 |
1 year of trading days |
update_frequency |
"daily" |
Recalculate daily |
MLP Component
| Parameter |
Value |
Search Range (TPE) |
Notes |
input_dim |
128 |
- |
Temporal embedding size |
hidden_dims |
[64] |
[[32], [64], [64, 32]] |
Hidden layer sizes |
output_dim |
4 |
- |
Vol_10d, Vol_30d, Regime, Confidence |
dropout |
0.2 |
[0.1, 0.2, 0.3] |
Regularization |
activation |
"relu" |
- |
ReLU activation |
batch_size |
128 |
[64, 128, 256] |
Per GPU |
epochs |
40 |
[30, 40, 50] |
With early stopping |
learning_rate |
1e-3 |
loguniform(5e-4, 5e-3) |
Adam LR |
weight_decay |
1e-5 |
loguniform(5e-6, 5e-5) |
L2 regularization |
early_stop_patience |
15 |
[10, 15, 20] |
Epochs without improvement |
optimizer |
"Adam" |
- |
Adam optimizer |
TPE Trials: 30-50
4B. Drawdown Risk (BiLSTM Dual Horizon)
| Parameter |
Value |
Search Range (TPE) |
Notes |
input_dim |
128 |
- |
Temporal embedding size |
hidden_dim |
64 |
[32, 64, 128] |
LSTM hidden size |
num_layers |
1 |
[1, 2] |
BiLSTM layers |
bidirectional |
True |
- |
Use both directions |
dropout |
0.3 |
[0.2, 0.3, 0.4] |
LSTM dropout |
output_dim_10d |
3 |
- |
Prob, depth, recovery (10-day) |
output_dim_30d |
3 |
- |
Prob, depth, recovery (30-day) |
batch_size |
64 |
[32, 64, 128] |
Per GPU |
epochs |
50 |
[30, 50, 75] |
With early stopping |
learning_rate |
1e-3 |
loguniform(5e-4, 5e-3) |
Adam LR |
weight_decay |
1e-4 |
loguniform(5e-5, 5e-4) |
L2 regularization |
gradient_clip |
1.0 |
- |
Clip gradient norm |
early_stop_patience |
20 |
[15, 20, 25] |
Epochs without improvement |
optimizer |
"Adam" |
- |
Adam optimizer |
scheduler |
"reduce_on_plateau" |
- |
Reduce LR when loss stalls |
TPE Trials: 30-50
4C. VaR & CVaR (Non-parametric)
No hyperparameters — statistical calculations only.
| Parameter |
Value |
Notes |
rolling_window |
504 |
2 years of trading days |
confidence_levels |
[0.95, 0.99] |
VaR thresholds |
update_frequency |
"daily" |
Recalculate daily |
method |
"historical" |
Empirical distribution |
4D. GNN Contagion Risk (StemGNN)
Based on successful baseline reproduction results.
| Parameter |
Value |
Search Range (TPE) |
Notes |
num_nodes |
4428 |
- |
Number of stocks |
window_size |
30 |
[15, 30, 60] |
Days of lookback |
horizon |
1 |
- |
Predict 1 step (contagion, not price) |
multi_layer |
13 |
[5, 8, 13, 20] |
StemGNN blocks |
embed_size |
32 |
[16, 32, 64] |
Node embedding dimension |
hidden_size |
64 |
[32, 64, 128] |
Hidden dimension |
learning_rate |
0.01 |
loguniform(0.001, 0.05) |
RMSprop LR |
exponential_decay_step |
13 |
[5, 8, 13] |
LR decay step |
decay_rate |
0.5 |
[0.3, 0.5, 0.7, 0.9] |
LR decay multiplier |
dropout_rate |
0.75 |
[0.5, 0.6, 0.75, 0.8] |
Regularization |
batch_size |
32 |
[16, 32, 64] |
Per GPU |
epochs |
100 |
[50, 75, 100] |
With early stopping |
optimizer |
"RMSprop" |
- |
From baseline |
norm_method |
"z_score" |
- |
Input normalization |
early_stop_patience |
20 |
[15, 20, 25] |
Epochs without improvement |
gradient_clip |
1.0 |
- |
Clip gradient norm |
train_length |
7 |
- |
Years for training split |
valid_length |
2 |
- |
Years for validation split |
test_length |
1 |
- |
Years for test split |
leakyrelu_rate |
0.2 |
- |
LeakyReLU slope |
cheb_k |
3 |
- |
Chebyshev polynomial order |
top_k_edges |
66 |
[20, 44, 66, 100] |
Edges per node (√4428 ≈ 66) |
TPE Trials: 50-100
4E. Liquidity Risk (Rule-based)
No hyperparameters — rule-based thresholds only.
| Parameter |
Value |
Notes |
min_volume_percentile |
20 |
Below this = illiquid |
max_spread_pct |
0.5 |
Above this = high slippage |
market_cap_tiers |
[10e9, 2e9] |
Large (>$10B), Mid, Small (<$2B) |
days_to_liquidate_threshold |
5 |
>5 days = liquidity warning |
update_frequency |
"daily" |
Recalculate daily |
4F. Regime Detection (MTGNN Graph Builder + Classifier)
Graph Builder Component
| Parameter |
Value |
Search Range |
Notes |
num_nodes |
4428 |
- |
Number of stocks |
node_embedding_dim |
64 |
[32, 64, 128] |
MTGNN embedding size |
top_k |
66 |
[20, 44, 66, 100] |
Edges per node |
cheb_k |
3 |
[2, 3, 5] |
Chebyshev order |
input_dim |
384 |
- |
128 temporal + 256 text |
Classifier Component
| Parameter |
Value |
Search Range (TPE) |
Notes |
input_features |
5 |
- |
Density, modularity, avg_degree, clustering_coef, transitivity |
hidden_dims |
[32] |
[[16], [32], [32, 16]] |
Hidden layer sizes |
output_classes |
4 |
- |
calm, volatile, crisis, rotation |
dropout |
0.2 |
[0.1, 0.2, 0.3] |
Regularization |
activation |
"relu" |
- |
ReLU activation |
batch_size |
256 |
[128, 256, 512] |
Per GPU |
epochs |
30 |
[20, 30, 40] |
With early stopping |
learning_rate |
1e-3 |
loguniform(5e-4, 5e-3) |
Adam LR |
weight_decay |
1e-5 |
loguniform(5e-6, 5e-5) |
L2 regularization |
early_stop_patience |
10 |
[5, 10, 15] |
Epochs without improvement |
optimizer |
"Adam" |
- |
Adam optimizer |
Training Frequency: Weekly (graph building only)
TPE Trials: 20-30 (classifier only)
4G. Position Sizing Engine (Rule-based, User-adjustable)
No hyperparameters — user-configurable weights.
| Parameter |
Default Value |
User Adjustable |
Notes |
weight_volatility |
0.20 |
Yes |
Weight for volatility risk |
weight_drawdown |
0.15 |
Yes |
Weight for drawdown risk |
weight_var_cvar |
0.15 |
Yes |
Combined VaR/CVaR weight |
weight_contagion |
0.25 |
Yes |
Weight for contagion risk |
weight_liquidity |
0.15 |
Yes |
Weight for liquidity risk |
weight_regime |
0.10 |
Yes |
Weight for regime risk |
threshold_full |
0.30 |
Yes |
Below this = 100% position |
threshold_high |
0.50 |
Yes |
Below this = 75% position |
threshold_medium |
0.70 |
Yes |
Below this = 50% position |
threshold_low |
0.85 |
Yes |
Below this = 25% position |
veto_threshold |
0.90 |
Yes |
Above this = no trade |
5. Fusion Layer
Fusion Engine (MLP + Rules)
MLP Component (Layer 1)
| Parameter |
Value |
Search Range (TPE) |
Notes |
input_dim |
13 |
- |
Qualitative(2) + Quantitative(2) + 9 module scores |
hidden_dims |
[64, 32] |
[[32], [64, 32], [128, 64, 32]] |
Hidden layer sizes |
output_dim |
3 |
- |
Buy/Hold/Sell logits |
dropout |
0.2 |
[0.1, 0.2, 0.3] |
Regularization |
activation |
"relu" |
- |
ReLU activation |
batch_norm |
True |
- |
Batch normalization |
batch_size |
256 |
[128, 256, 512] |
Per GPU |
epochs |
50 |
[30, 50, 75] |
With early stopping |
learning_rate |
1e-3 |
loguniform(5e-4, 5e-3) |
Adam LR |
weight_decay |
1e-5 |
loguniform(5e-6, 5e-5) |
L2 regularization |
label_smoothing |
0.05 |
[0.0, 0.05, 0.1] |
Soften labels |
early_stop_patience |
15 |
[10, 15, 20] |
Epochs without improvement |
optimizer |
"Adam" |
- |
Adam optimizer |
Rule-based Component (Layer 2)
| Rule |
Condition |
Action |
liquidity_veto |
liquidity_score < 0.3 |
REJECT (no trade) |
drawdown_cap |
drawdown_probability > 0.8 |
Cap size at 25% |
contagion_veto |
contagion_score > 0.9 |
REJECT (no trade) |
regime_override |
regime == “crisis” AND confidence > 0.7 |
Force SELL or HOLD |
TPE Trials: 50-100
6. Hyperparameter Search Spaces
TPE (Bayesian Optimization) Configuration
tpe_config:
algorithm: "tpe" # Tree-structured Parzen Estimator
n_initial_points: 20 # Random exploration before TPE
n_trials: # Model-specific (see below)
temporal_encoder: 75
finbert: 25
fundamental_mlp: 30
technical_analyst: 40
sentiment_analyst: 20
news_analyst: 25
volatility_mlp: 40
drawdown_bilstm: 40
stemgnn: 75
mtgnn_classifier: 25
fusion_mlp: 75
early_stop_trials: 20 # Stop if no improvement after 20 trials
direction: "minimize" # Minimize validation loss
pruner: "median" # Prune unpromising trials
Grid Search Configuration (for LightGBM/XGBoost)
grid_search_config:
cv_folds: 5 # Chronological cross-validation
scoring: "neg_mean_squared_error"
n_jobs: -1 # Use all CPU cores
verbose: 1
7. Learning Rate Schedules
Cosine Decay with Warmup (Temporal Encoder)
def cosine_decay_with_warmup(step, warmup_steps, total_steps, base_lr):
if step < warmup_steps:
# Linear warmup
return base_lr * (step / warmup_steps)
else:
# Cosine decay
progress = (step - warmup_steps) / (total_steps - warmup_steps)
return base_lr * 0.5 * (1 + math.cos(math.pi * progress))
Reduce on Plateau (BiLSTM Models)
scheduler = ReduceLROnPlateau(
optimizer,
mode='min',
factor=0.5, # Reduce LR by half
patience=10, # Wait 10 epochs
min_lr=1e-6 # Minimum LR
)
Linear Decay with Warmup (FinBERT)
def linear_decay_with_warmup(step, warmup_steps, total_steps, base_lr):
if step < warmup_steps:
return base_lr * (step / warmup_steps)
else:
return base_lr * (1 - (step - warmup_steps) / (total_steps - warmup_steps))
Exponential Decay (StemGNN)
def exponential_decay(epoch, base_lr, decay_step, decay_rate):
return base_lr * (decay_rate ** (epoch // decay_step))
8. Regularization Summary
| Model |
Dropout |
Attn Dropout |
Weight Decay |
Label Smooth |
Early Stop |
Grad Clip |
| Temporal Encoder |
0.1 |
0.1 |
1e-5 |
0.05 |
20 |
1.0 |
| FinBERT |
0.1 |
0.1 |
0.01 |
- |
5 |
1.0 |
| Fundamental MLP |
0.2 |
- |
1e-5 |
- |
15 |
- |
| Technical Analyst |
0.3 |
- |
1e-4 |
- |
20 |
1.0 |
| Sentiment Analyst |
0.2 |
- |
1e-5 |
0.05 |
15 |
1.0 |
| News Analyst |
0.1 |
0.1 |
1e-5 |
- |
15 |
1.0 |
| Volatility MLP |
0.2 |
- |
1e-5 |
- |
15 |
1.0 |
| Drawdown BiLSTM |
0.3 |
- |
1e-4 |
- |
20 |
1.0 |
| StemGNN |
0.75 |
- |
1e-5 |
- |
20 |
1.0 |
| MTGNN Classifier |
0.2 |
- |
1e-5 |
- |
10 |
1.0 |
| Fusion MLP |
0.2 |
- |
1e-5 |
0.05 |
15 |
1.0 |
XGBoost/LightGBM Regularization
| Model |
L1 (reg_alpha) |
L2 (reg_lambda) |
Subsampling |
Early Stop |
| XGBoost (Fundamental Encoder) |
0.1 |
1.0 |
0.7 |
50 |
| LightGBM (Fundamental Analyst) |
0.1 |
1.0 |
0.7 |
50 |
9. YAML Configuration File
# hyperparameters.yaml
# Complete hyperparameter configuration for all models
version: "1.0"
date: "2026-04-23"
global:
device: "cuda"
seed: 42
dtype: "float32"
num_workers: 4
pin_memory: true
mixed_precision: true
checkpoint_dir: "./checkpoints/"
log_dir: "./logs/"
training_chunks:
chunk_1:
train: [2000, 2001, 2002, 2003, 2004]
val: [2005]
test: [2006]
chunk_2:
train: [2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014]
val: [2015]
test: [2016]
chunk_3:
train: [2017, 2018, 2019, 2020, 2021, 2022]
val: [2023]
test: [2024]
encoders:
temporal:
model: "transformer"
d_model: 128
n_layers: 4
n_heads: 4
d_ff: 512
dropout: 0.1
attention_dropout: 0.1
activation: "gelu"
max_seq_len: 90
batch_size: 32
epochs: 100
learning_rate: 1.0e-4
weight_decay: 1.0e-5
warmup_steps: 4000
lr_schedule: "cosine"
gradient_clip: 1.0
label_smoothing: 0.05
early_stop_patience: 20
optimizer: "AdamW"
tpe_trials: 75
finbert:
base_model: "ProsusAI/finbert"
max_length: 512
projection_dim: 256
freeze_base: false
batch_size: 16
epochs_per_chunk: 3
learning_rate: 2.0e-5
weight_decay: 0.01
warmup_proportion: 0.1
gradient_clip: 1.0
dropout: 0.1
early_stop_patience: 5
optimizer: "AdamW"
scheduler: "linear"
tpe_trials: 25
fundamental:
xgboost:
max_depth: 4
learning_rate: 0.01
n_estimators: 500
subsample: 0.7
colsample_bytree: 0.7
reg_alpha: 0.1
reg_lambda: 1.0
min_child_weight: 5
early_stopping_rounds: 50
objective: "reg:squarederror"
tree_method: "hist"
mlp:
input_dim: 70
hidden_dims: [256]
output_dim: 128
dropout: 0.2
activation: "relu"
use_layer_norm: true
batch_size: 128
epochs: 50
learning_rate: 1.0e-3
weight_decay: 1.0e-5
early_stop_patience: 15
optimizer: "Adam"
tpe_trials: 30
analysts:
technical:
model: "bilstm"
input_dim: 128
hidden_dim: 64
num_layers: 1
bidirectional: true
dropout: 0.3
use_attention_pooling: true
output_dim: 3
batch_size: 64
epochs: 50
learning_rate: 1.0e-3
weight_decay: 1.0e-4
gradient_clip: 1.0
early_stop_patience: 20
optimizer: "Adam"
scheduler: "reduce_on_plateau"
tpe_trials: 40
sentiment:
model: "mlp"
input_dim: 256
hidden_dims: [128, 64]
output_dim: 2
dropout: 0.2
activation: "relu"
batch_norm: true
batch_size: 128
epochs: 30
learning_rate: 1.0e-3
weight_decay: 1.0e-5
label_smoothing: 0.05
early_stop_patience: 15
optimizer: "Adam"
tpe_trials: 20
news:
model: "attention_pooling"
input_dim: 256
num_heads: 4
head_dim: 64
dropout: 0.1
attention_dropout: 0.1
output_dim: 2
batch_size: 64
epochs: 40
learning_rate: 1.0e-3
weight_decay: 1.0e-5
gradient_clip: 1.0
early_stop_patience: 15
optimizer: "Adam"
tpe_trials: 25
fundamental_lgb:
model: "lightgbm"
input_dim: 128
objective: "multiclass"
num_class: 3
boosting_type: "gbdt"
num_leaves: 31
learning_rate: 0.01
n_estimators: 500
subsample: 0.7
colsample_bytree: 0.7
reg_alpha: 0.1
reg_lambda: 1.0
min_child_samples: 20
min_child_weight: 0.001
early_stopping_rounds: 50
verbosity: -1
risk:
volatility:
garch:
p: 1
q: 1
dist: "normal"
rolling_window: 252
update_frequency: "daily"
mlp:
input_dim: 128
hidden_dims: [64]
output_dim: 4
dropout: 0.2
activation: "relu"
batch_size: 128
epochs: 40
learning_rate: 1.0e-3
weight_decay: 1.0e-5
early_stop_patience: 15
optimizer: "Adam"
tpe_trials: 40
drawdown:
model: "bilstm"
input_dim: 128
hidden_dim: 64
num_layers: 1
bidirectional: true
dropout: 0.3
output_dim_10d: 3
output_dim_30d: 3
batch_size: 64
epochs: 50
learning_rate: 1.0e-3
weight_decay: 1.0e-4
gradient_clip: 1.0
early_stop_patience: 20
optimizer: "Adam"
scheduler: "reduce_on_plateau"
tpe_trials: 40
var_cvar:
rolling_window: 504
confidence_levels: [0.95, 0.99]
update_frequency: "daily"
method: "historical"
contagion:
model: "stemgnn"
num_nodes: 4428
window_size: 30
horizon: 1
multi_layer: 13
embed_size: 32
hidden_size: 64
learning_rate: 0.01
exponential_decay_step: 13
decay_rate: 0.5
dropout_rate: 0.75
batch_size: 32
epochs: 100
optimizer: "RMSprop"
norm_method: "z_score"
early_stop_patience: 20
gradient_clip: 1.0
train_length: 7
valid_length: 2
test_length: 1
leakyrelu_rate: 0.2
cheb_k: 3
top_k_edges: 66
tpe_trials: 75
liquidity:
min_volume_percentile: 20
max_spread_pct: 0.5
market_cap_tiers: [10.0e9, 2.0e9]
days_to_liquidate_threshold: 5
update_frequency: "daily"
regime:
graph_builder:
num_nodes: 4428
node_embedding_dim: 64
top_k: 66
cheb_k: 3
input_dim: 384
classifier:
input_features: 5
hidden_dims: [32]
output_classes: 4
dropout: 0.2
activation: "relu"
batch_size: 256
epochs: 30
learning_rate: 1.0e-3
weight_decay: 1.0e-5
early_stop_patience: 10
optimizer: "Adam"
tpe_trials: 25
position_sizing:
weights:
volatility: 0.20
drawdown: 0.15
var_cvar: 0.15
contagion: 0.25
liquidity: 0.15
regime: 0.10
thresholds:
full: 0.30
high: 0.50
medium: 0.70
low: 0.85
veto: 0.90
fusion:
mlp:
input_dim: 13
hidden_dims: [64, 32]
output_dim: 3
dropout: 0.2
activation: "relu"
batch_norm: true
batch_size: 256
epochs: 50
learning_rate: 1.0e-3
weight_decay: 1.0e-5
label_smoothing: 0.05
early_stop_patience: 15
optimizer: "Adam"
tpe_trials: 75
rules:
liquidity_veto:
condition: "liquidity_score < 0.3"
action: "REJECT"
drawdown_cap:
condition: "drawdown_probability > 0.8"
action: "CAP_SIZE_25%"
contagion_veto:
condition: "contagion_score > 0.9"
action: "REJECT"
regime_override:
condition: "regime == 'crisis' AND confidence > 0.7"
action: "FORCE_SELL_OR_HOLD"
tpe_config:
algorithm: "tpe"
n_initial_points: 20
early_stop_trials: 20
direction: "minimize"
pruner: "median"
Document Version: 1.0
Status: Ready for Implementation
Next Step: Generate training scripts based on these configurations.