This document is the XAI specification for the project:
An Explainable Multimodal Neural Framework for Financial Risk Management
The purpose of this document is to define:
The project is not simply a prediction system with explanation added afterward. Explainability is part of the architecture.
The central claim is:
A distributed financial AI system becomes more transparent when each specialised module exposes its own reasoning trace, and the final decision explicitly shows how those traces were fused and constrained.
The XAI design has three levels:
This is stronger than only applying a generic post-hoc method to the final model because it preserves the internal structure of the decision process.
The XAI system must answer five questions for any final decision:
| Question | Required explanation |
|---|---|
| What did the system decide? | Buy / Hold / Sell, confidence, position size |
| Why did it decide that? | Fused quantitative + qualitative evidence |
| What risks mattered most? | Risk attention, rule caps, dominant risk drivers |
| Did any rule override the learned model? | Rule barrier trace |
| Can the decision be audited later? | Stored CSV/JSON XAI outputs with ticker-date keys |
The final explanation should be understandable to:
ENCODERS
├── Temporal Encoder
│ ├── attention over timesteps
│ ├── embedding audit
│ └── finite/alignment checks
│
└── FinBERT Encoder
├── text chunk metadata
├── token/document provenance
├── PCA projection audit
└── embedding trace
ANALYSTS
├── Sentiment Analyst
│ ├── sentiment score
│ ├── confidence / uncertainty
│ └── gradient or feature contribution trace
│
├── News Analyst
│ ├── event impact
│ ├── news importance
│ ├── risk relevance
│ └── event-driver explanation
│
├── Technical Analyst
│ ├── timestep attention
│ ├── gradient feature importance
│ └── counterfactual direction/timing explanation
│
├── Qualitative Analyst
│ ├── dominant qualitative driver
│ ├── event aggregation explanation
│ └── daily text evidence summary
│
└── Quantitative Analyst
├── attention over risk modules
├── top risk driver
├── risk-adjusted signal explanation
└── position/risk context
RISK ENGINE
├── Volatility
│ └── volatility component explanation
├── Drawdown
│ └── attention + gradient + counterfactual explanation
├── VaR/CVaR
│ └── empirical tail-risk trace
├── StemGNN Contagion
│ └── adjacency + edge/node importance + optional GNNExplainer
├── Liquidity
│ └── rule/component trace
├── MTGNN Regime
│ └── graph property + macro stress explanation
└── Position Sizing
└── cap/reduction/risk-budget rule trace
FUSION
├── learned quantitative weight
├── learned qualitative weight
├── learned signal/risk/confidence
├── rule barrier reasons
├── final position cap
└── final explanation summary
Every major module should return or store an explanation object using a consistent structure.
Recommended schema:
{
"module": "ModuleName",
"chunk": 1,
"split": "test",
"ticker": "AAPL",
"date": "2024-03-28",
"prediction": {
"primary_output": 0.123,
"recommendation": "HOLD",
"confidence": 0.57,
"risk_score": 0.44
},
"xai": {
"summary": "Plain-English explanation of the output.",
"top_drivers": [
{"name": "contagion", "value": 0.25, "direction": "risk-increasing"},
{"name": "volatility", "value": 0.20, "direction": "risk-increasing"}
],
"method": "attention | gradient | rule_trace | graph_explainer | counterfactual",
"confidence_notes": "What makes the explanation reliable or uncertain.",
"limitations": "Known limitation of this explanation."
},
"provenance": {
"input_files": [],
"model_checkpoint": "",
"generated_at": "",
"row_id": "optional"
}
}
CSV outputs may contain a simplified version through:
xai_summary
fusion_xai_summary
risk_summary
size_reduction_reasons
rule_barrier_reasons
JSON files should contain richer explanation reports.
The project uses five explanation levels.
| Level | Name | Purpose | Example |
|---|---|---|---|
| L0 | Data provenance | Shows where the input came from | ticker/date/filing metadata |
| L1 | Local driver explanation | Explains one prediction | attention weights, top risk driver |
| L2 | Mechanism explanation | Explains how the module produced the score | rule trace, graph properties |
| L3 | Counterfactual explanation | Shows what would change the output | lower drawdown risk would increase position |
| L4 | System-level explanation | Explains final fused decision | branch weights + rule barrier |
Not every module needs every level. However, every final user-facing decision should include L0, L1, L2, and L4. L3 should be included where computationally practical.
The Temporal Encoder explains which parts of the historical sequence contributed most to the learned market representation.
| Method | Purpose |
|---|---|
| Attention weights | Identify important timesteps |
| Gradient feature importance | Identify sensitive input dimensions/features |
| Embedding audit | Validate finite embeddings and row alignment |
Expected explanation artefacts:
outputs/embeddings/TemporalEncoder/*_manifest.csv
outputs/embeddings/TemporalEncoder/*_embeddings.npy
outputs/results/TemporalEncoder/xai/
The Temporal Encoder placed highest attention on the most recent and mid-window timesteps, indicating that both immediate price behaviour and recent historical context influenced the embedding.
Attention is not a perfect causal explanation. It shows where the model focused, not necessarily which feature caused the final downstream decision.
FinBERT must preserve text provenance and embedding traceability.
| Method | Purpose |
|---|---|
| Metadata trace | Links embedding rows to filings/chunks |
| Token/chunk provenance | Identifies form type, source section, filing date |
| PCA manifest | Shows 768→256 projection fitted on train only |
| Downstream explanation | Sentiment/News modules explain semantic effect |
chunk_id
doc_id
year
form_type
cik
filing_date
accession
source_name
chunk_index
word_count
FinBERT itself is an encoder. Its primary explainability is provenance and downstream semantic interpretation. Sentiment and News Analysts are responsible for producing explicit text-impact explanations.
The Sentiment Analyst explains the emotional/market tone inferred from financial text.
| Method | Purpose |
|---|---|
| Prediction decomposition | sentiment score, confidence, uncertainty, magnitude |
| Gradient feature importance | embedding dimensions that influenced sentiment |
| Metadata trace | filing section and document source |
| Summary sentence | plain-English sentiment interpretation |
sentiment_score
sentiment_confidence
sentiment_uncertainty
sentiment_magnitude
xai_summary
The text produced mildly negative sentiment with moderate confidence. The uncertainty remains material, so the qualitative branch should not dominate the final decision.
The News Analyst explains event importance and event risk relevance.
| Method | Purpose |
|---|---|
| Event-impact score | Indicates positive/negative event pressure |
| News importance | Indicates how important the event is |
| Risk relevance | Indicates whether the event is risk-related |
| Driver summary | Shows whether volatility, drawdown, or uncertainty dominates |
news_event_impact
news_importance
risk_relevance
volatility_spike
drawdown_risk
news_uncertainty
xai_summary
The filing section was treated as risk-relevant and mildly negative, increasing qualitative risk but not enough to override the quantitative branch.
The Technical Analyst explains market-direction evidence from temporal embeddings.
| Level | Method | Output |
|---|---|---|
| L1 | Attention weights | important timesteps |
| L2 | Gradient feature importance | embedding dimensions driving trend/momentum/timing |
| L3 | Counterfactuals | what change would alter technical call |
trend_score
momentum_score
timing_confidence
technical_confidence
technical_direction_score
xai_summary
The technical branch shows positive momentum but only moderate timing confidence, so the signal is supportive but not strong enough by itself to force a BUY.
The Volatility Model explains expected instability over short and medium horizons.
| Method | Purpose |
|---|---|
| GARCH component trace | classical volatility baseline |
| Recent realised volatility | recent observed risk |
| Neural adjustment explanation | learned correction from temporal embedding |
| Regime probability | low/medium/high volatility state |
vol_10d
vol_30d
volatility_risk_score
volatility_regime_label
volatility_confidence
garch_vol
recent_vol
Volatility risk is high because both recent realised volatility and the GARCH baseline are elevated, causing the position sizing module to reduce exposure.
The Drawdown Risk Model explains downside path risk.
| Level | Method | Output |
|---|---|---|
| L1 | Attention weights | timesteps warning of drawdown |
| L2 | Gradient importance | embedding dimensions tied to drawdown risk |
| L3 | Counterfactuals | what would reduce drawdown estimate |
expected_drawdown_10d
expected_drawdown_30d
drawdown_risk_10d
drawdown_risk_30d
drawdown_risk_score
recovery_days_10d
recovery_days_30d
confidence_10d
confidence_30d
The model estimates moderate 30-day drawdown risk and a longer recovery period, so the risk engine reduces position size even if the directional signal is positive.
VaR explains the historical threshold loss at a confidence level.
VaR is statistical, so its explanation is not neural. It should expose:
var_95
var_99
The 95% historical VaR indicates that losses worse than this threshold occurred in approximately the worst 5% of days in the historical rolling window.
CVaR explains the average loss beyond the VaR threshold.
Like VaR, CVaR is statistical and should expose:
cvar_95
cvar_99
tail_ratio_95
tail_ratio_99
CVaR is more severe than VaR, meaning that once the loss threshold is breached, the average tail loss is materially larger than the VaR cutoff.
Liquidity explains whether a position can be traded safely.
Liquidity is best explained through a rule/component trace.
liquidity_score
slippage_estimate_pct
days_to_liquidate_1M
tradable
dv_score
vr_score
to_score
The asset is considered tradable because dollar volume and turnover are sufficient, and estimated slippage is low. Liquidity does not block the position.
or:
The asset is not tradable under the rule barrier because liquidity score is below the minimum threshold, forcing HOLD and zero position.
StemGNN explains cross-asset contagion risk: whether risk may spread through relationships among assets.
| Level | Method | Purpose |
|---|---|---|
| L1 | Learned adjacency / top influencers | Which assets influence the target most |
| L2 | Gradient node/edge importance | Which relationships changed the score |
| L3 | GNNExplainer approximation | Local explanatory subgraph |
contagion_5d
contagion_20d
contagion_60d
contagion_risk_score
GNNExplainer-style explanations are more expensive and should be opt-in for large runs.
Recommended mode:
always-on: adjacency + top influencers + gradient importance
optional: --enable-gnnexplainer
Contagion is the top risk driver because the asset is strongly connected to a stressed cluster in the learned cross-asset graph.
The regime model explains the current market state using learned graph structure and macro/regime features.
| Method | Purpose |
|---|---|
| Graph properties | density, degree, entropy, edge weight stress |
| Macro stress score | macro contribution to regime state |
| Regime probabilities | probability of calm/volatile/crisis/rotation |
| Graph diff | optional comparison to previous period |
| Key edges | important graph connections |
regime_label
regime_confidence
prob_calm
prob_volatile
prob_crisis
prob_rotation
graph_density
avg_degree_norm
std_degree_norm
mean_edge_weight
max_edge_weight
graph_entropy
learned_graph_stress
macro_stress_score
label_graph_stress_score
The system classifies the market as crisis because graph stress and macro stress are elevated, and the crisis probability dominates other regime probabilities.
Position Sizing explains why a certain capital allocation was recommended.
This module should use rule trace explanations.
recommended_capital_fraction
recommended_capital_pct
position_fraction_of_max
binding_cap_source
hard_cap_applied
size_bucket
risk_budget_used
size_reduction_reasons
The recommended exposure is 3% because the asset is in a crisis regime; the regime hard cap is binding even though the technical signal is positive.
The Qualitative Analyst explains the daily ticker-level text view.
| Method | Purpose |
|---|---|
| Event aggregation trace | number of events and source types |
| Dominant driver | risk relevance, news uncertainty, sentiment, etc. |
| Score decomposition | sentiment + news impact + confidence |
| Row-level summary | plain-English daily explanation |
event_count
sentiment_event_count
news_event_count
qualitative_score
qualitative_risk_score
qualitative_confidence
qualitative_recommendation
max_event_risk_score
mean_event_risk_score
mean_sentiment_score
mean_news_impact_score
mean_news_importance
dominant_qualitative_driver
xai_summary
The qualitative branch remains HOLD because the text signal is weak and confidence is low, even though risk relevance is present.
The Quantitative Analyst explains how technical and risk-engine evidence were combined.
| Method | Purpose |
|---|---|
| Risk attention weights | learned importance across risk modules |
| Top risk driver | most influential risk source |
| Risk-adjusted signal | final quantitative directional signal |
| Position context | recommended exposure and hard-cap reason |
attention_pooled_risk_score
top_attention_risk_driver
risk_attention_volatility
risk_attention_drawdown
risk_attention_var_cvar
risk_attention_contagion
risk_attention_liquidity
risk_attention_regime
xai_summary
The quantitative branch recommends HOLD because the technical signal is positive but contagion and regime risks dominate the risk attention, limiting the risk-adjusted signal.
If top_attention_risk_driver and the risk_attention_* columns are missing, the file is not final Fusion-ready.
Fusion explains the final decision.
It must show:
final_recommendation
final_fusion_signal
final_fusion_risk_score
final_fusion_confidence
final_position_fraction
final_position_pct
learned_recommendation
learned_quantitative_weight
learned_qualitative_weight
branch_weight_dominance
rule_changed_action
user_rule_cap_fraction
pre_rule_learned_position_fraction
rule_barrier_reasons
fusion_xai_summary
The learned fusion layer weighted the quantitative branch at 0.82 and the qualitative branch at 0.18. The final signal was mildly positive, but the user rule barrier capped position size because the regime was crisis. Final decision: HOLD with 3% maximum allowed exposure.
Learned model proposed BUY. Rule barrier changed action to HOLD because quantitative risk exceeded the buy-veto threshold and liquidity score was below the minimum threshold.
The Final Trade Approver should be a thin, auditable layer.
It should not hide Fusion logic. It should format and preserve it.
Final explanation should include:
Final decision
Final confidence
Final position size
Quantitative branch summary
Qualitative branch summary
Top risk drivers
Rule barrier trace
Module-level evidence links
Some explanations are cheap and should always be generated. Others are expensive and should be optional.
| XAI type | Runtime cost | Policy |
|---|---|---|
CSV xai_summary text |
Low | Always on |
| Rule trace | Low | Always on |
| Attention weights | Low/medium | Always on when model supports it |
| Gradient importance | Medium | On validation/test or samples |
| Counterfactuals | Medium/high | Samples or requested tickers |
| GNNExplainer | High | Opt-in with flag |
| Full SHAP on large neural models | High | Sampled only |
Recommended approach:
Production full run: lightweight XAI for all rows + rich XAI for samples
Debug/defence run: enable heavier XAI for selected assets/dates
Recommended output structure:
outputs/results/
├── analysts/
│ ├── sentiment/
│ └── news/
│
├── TechnicalAnalyst/
│ └── xai/
│
├── QualitativeAnalyst/
│ └── xai/
│
├── QuantitativeAnalyst/
│ └── xai/
│
├── risk/
│ └── xai/
│
├── StemGNN/
│ └── xai/
│
├── MTGNNRegime/
│ └── xai/
│
├── PositionSizing/
│ └── xai/
│
└── FusionEngine/
└── xai/
Each prediction CSV should include a compact explanation column. Each XAI folder should contain JSON summaries or richer sampled explanations.
The explanation trace should move forward with the prediction.
Module output CSV
├── numeric prediction columns
├── confidence/risk columns
├── compact xai_summary
└── optional path to rich XAI JSON
Downstream modules should not discard upstream explanations.
For example:
Sentiment xai_summary
News xai_summary
↓
Qualitative xai_summary
↓
Fusion qualitative_xai_summary
And:
Risk module summaries
Position sizing rule trace
Quantitative risk attention
↓
Fusion quantitative_xai_summary
↓
Final decision explanation
Text data is sparse. Many ticker-date rows have no matching filing/news event.
The system must explain missing text explicitly instead of treating it as positive or negative.
Neutral qualitative state:
qualitative_score = 0.0
qualitative_risk_score = 0.5
qualitative_confidence = 0.0
event_count = 0
dominant_qualitative_driver = no_text_event
Explanation:
No qualitative text event matched this ticker-date; the qualitative branch was kept neutral and received low fusion weight.
The user rule barrier is one of the most important explanation components.
It must always expose:
Recommended fields:
rule_changed_action
user_rule_cap_fraction
pre_rule_learned_position_fraction
final_position_fraction
rule_barrier_reasons
Example values:
rule_barrier_reasons = crisis_cap_0.03; position_reduced_by_rule_barrier; action_changed_by_rule_barrier
XAI outputs must be audited, not just generated.
All numeric explanation columns must be finite.
Audit checks:
No NaN in attention weights
No infinite risk score
No invalid confidence outside [0, 1]
No branch weights that fail to sum to 1
For attention distributions:
sum(attention_weights) ≈ 1
all weights >= 0
no missing risk modules
If final position is less than recommended position, rule_barrier_reasons must explain why.
If learned action differs from final action, rule_changed_action must be 1 and reasons must be non-empty.
Fusion-ready Quantitative files must contain:
top_attention_risk_driver
risk_attention_volatility
risk_attention_drawdown
risk_attention_var_cvar
risk_attention_contagion
risk_attention_liquidity
risk_attention_regime
attention_pooled_risk_score
Counterfactuals should be plausible. For example:
Explanations should not change wildly for tiny input perturbations unless the decision is near a threshold.
Recommended check:
Same ticker over neighbouring dates should show gradually changing drivers unless there is an actual event/regime shift.
Suggested explanation-quality metrics:
| Metric | Applies to | Meaning |
|---|---|---|
| Attention entropy | Technical, Quantitative, GNN | Whether attention is concentrated or diffuse |
| Top-driver stability | Quantitative, Fusion | Whether dominant drivers are stable across nearby dates |
| Rule-trace completeness | Position/Fusion | Whether every override has a reason |
| Counterfactual validity | Technical/Drawdown/Fusion | Whether proposed changes alter output as expected |
| Feature-importance consistency | Gradient/SHAP-style outputs | Whether important features remain meaningful across runs |
| Human readability | All modules | Whether explanation is understandable in report/UI |
A future interface should not show all raw columns by default. It should show layered explanation cards.
The report should describe XAI as architectural, not decorative.
Suggested wording:
The proposed framework integrates explainability at multiple levels. Individual modules expose local explanations such as attention weights, gradient-based feature sensitivity, rule traces, graph properties, and event-level text drivers. These module-level explanations are then propagated into qualitative and quantitative synthesis branches. The final hybrid Fusion Engine provides system-level explainability by reporting learned branch weights, final signal and risk estimates, position caps, and user-rule overrides. This enables the final Buy/Hold/Sell decision to be audited through both learned evidence and explicit risk-control logic.
The system must be honest about limitations.
These limitations do not invalidate the design. They define the boundary of what the explanations can claim.
Must provide at least one of:
Must provide:
Must provide:
Must provide:
[ ] Temporal Encoder has embedding manifests and attention/XAI samples.
[ ] FinBERT embeddings preserve metadata and PCA provenance.
[ ] Sentiment Analyst outputs sentiment explanation fields.
[ ] News Analyst outputs event/risk explanation fields.
[ ] Technical Analyst outputs attention/gradient/counterfactual XAI.
[ ] Volatility outputs component-level risk explanation.
[ ] Drawdown outputs attention/gradient/counterfactual XAI.
[ ] VaR/CVaR outputs statistical tail-risk trace.
[ ] Liquidity outputs rule/component trace.
[ ] StemGNN outputs adjacency/top-influencer/gradient XAI and optional GNNExplainer.
[ ] MTGNN Regime outputs graph property and macro/regime explanation.
[ ] Position Sizing outputs cap/reduction reasons.
[ ] Qualitative Analyst outputs daily qualitative XAI summaries.
[ ] Quantitative Analyst outputs risk attention weights and top risk driver.
[ ] Fusion outputs branch weights, rule barrier reasons, and final explanation.
[ ] Final decision output preserves module-level and system-level explanation trace.
The XAI strategy of this project is based on traceable modular reasoning.
Each model explains its own part of the decision. The qualitative and quantitative branches summarise their evidence. The Fusion Engine then explains how those branches were weighted, how risk affected the final decision, and whether user-defined safety rules changed the learned recommendation.
This makes the system defensible as an explainable financial AI framework because the final output is not just a prediction. It is a structured decision trace.