fin-glassbox

fin-glassbox

An Explainable Multimodal Neural Framework for Financial Risk Management

A research-oriented financial AI system for building risk-aware, explainable market decision pipelines from multiple financial data modalities. The framework combines temporal market encoders, FinBERT-based text encoders, graph neural risk modelling, classical financial risk measures, trained analyst modules, interpretable position sizing, and hybrid fusion with explicit XAI traces.

The repository is designed around one central idea: financial decisions should not come from one opaque monolithic model. Instead, the system decomposes the financial decision problem into specialised modules, lets each module model one part of market risk or market context, and then fuses those outputs through a transparent, risk-constrained decision layer.

Research scope: This repository is for academic research, experimentation, and explainable AI system design. It is not financial advice, trading advice, or an investment product.


Table of contents


Project purpose

Financial risk management is a naturally multimodal problem. Market risk is visible in price movement and volatility, but also in financial text, macroeconomic state, liquidity, drawdown structure, cross-asset contagion, and regime shifts. A single model can learn some of these patterns, but it is difficult to audit, explain, and defend when the model’s internal reasoning is opaque.

This repository implements a modular neural framework that separates the problem into interpretable components:

The framework is built to support research questions such as:


System philosophy

The framework follows a specialisation + multimodality + explainability + risk-control design.

Specialisation

Each module is assigned a clear role. The volatility module estimates instability, the drawdown module estimates downside path risk, StemGNN estimates cross-asset contagion, MTGNN-style graph logic contributes to regime detection, FinBERT supports financial text understanding, and Fusion combines branch outputs into a final decision.

Multimodality

The system uses structured market data, textual filing/news representations, macro/regime data, and cross-asset relation graphs. These modalities are not merged prematurely. They are processed by specialised pipelines first, then integrated at higher synthesis layers.

Explainability

Every major module is expected to expose explanation artefacts, not only predictions. XAI is treated as a first-class system output rather than an afterthought.

Risk-aware decision-making

The Risk Engine is central. Fusion is not allowed to ignore risk outputs. Position sizing and final rules constrain the final output so that predicted opportunity does not automatically become unrestricted exposure.


High-level architecture

INPUTS
├── Time-Series Market Data
├── Financial Text Data
├── Macro / Regime Data
└── Cross-Asset Relation Data

ENCODERS
├── Shared Temporal Attention Encoder
└── FinBERT Financial Text Encoder

ANALYST MODULES
├── Technical Analyst
├── Sentiment Analyst
├── News Analyst
├── Qualitative Analyst
└── Quantitative Analyst

RISK ENGINE
├── Volatility Risk Module
├── Drawdown Risk Module
├── Historical VaR Module
├── CVaR / Expected Shortfall Module
├── StemGNN Contagion Risk Module
├── Liquidity Risk Module
├── MTGNN Regime Detection Module
└── Position Sizing Engine

FUSION
├── Learned Fusion Layer
└── User Rule Barrier

OUTPUT
├── Buy / Hold / Sell
├── Confidence Score
├── Position Size
├── Risk Summary
└── XAI Explanation Trace

The active implementation excludes fundamentals. Older design documents may mention fundamentals as part of earlier architecture planning; the current repository-level documentation treats the active implementation as market + text + macro + graph + risk + fusion.

For the full system workflow, see WORKFLOW.md. For the explainability design, see xAI.md.


Repository structure

fin-glassbox/
├── README.md
├── SETUP.md
├── WORKFLOW.md
├── xAI.md
├── LICENSE
├── requirements_linux_venv.txt
├── code/
│   ├── analysts/
│   ├── encoders/
│   ├── fusion/
│   ├── gnn/
│   ├── riskEngine/
│   └── yfinance_ib/
├── data/
│   ├── FRED_data/
│   ├── graphs/
│   ├── sec_edgar/
│   └── yFinance/
├── researchPapers/
└── outputs/

code/

Contains the executable model and pipeline code. Most module files are designed to be usable both as importable modules and as CLI scripts for inspection, smoke testing, HPO, training, prediction, and validation.

data/

Contains data engineering documentation and expected data-family locations. Large raw and processed datasets are usually external, ignored, or handled separately from normal Git tracking because of size.

outputs/

Contains generated embeddings, model checkpoints, predictions, HPO artefacts, XAI summaries, and fused decision outputs. This directory is a runtime artefact location rather than source documentation.

researchPapers/

Contains project-level research notes, workflow references, hyperparameter configuration notes, literature-related material, and older architectural references.


Documentation map

The repository is intentionally documentation-heavy because the system is modular. Important information is distributed across module-level README files and specialised markdown files.

Root-level documentation

File Purpose
README.md Repository overview, architecture, documentation map, and project introduction.
SETUP.md Environment setup, dependency installation, Git LFS, CUDA notes, and validation commands.
WORKFLOW.md Full current workflow and module interaction design.
xAI.md XAI integration strategy across encoders, analysts, risk modules, fusion, and final outputs.
LICENSE Repository licence.

Analyst documentation

File Purpose
code/analysts/README.md Overview of all analyst modules.
code/analysts/SentimentAnalyst.md FinBERT-based sentiment analyst documentation.
code/analysts/NewsAnalyst.md News/event analyst documentation.
code/analysts/TechnicalAnalyst.md Technical analyst documentation using temporal embeddings.
code/analysts/QuantitativeAnalyst.md Quantitative analyst documentation and attention-weighted risk pooling design.
code/analysts/QualitativeAnalyst.md Qualitative analyst documentation.

Encoder documentation

File Purpose
code/encoders/README.md Encoder folder overview.
code/encoders/TemporalEncoder.md Shared Temporal Attention Encoder documentation.
code/encoders/FinBERT_Encoder.md FinBERT encoder, MLM adaptation, PCA projection, and embedding generation documentation.
code/encoders/TextEncoder.md Text encoding design notes and related text pipeline context.

GNN documentation

File Purpose
code/gnn/README.md GNN folder overview.
code/gnn/CrossAssetRelationData.md Cross-asset graph data construction documentation.
code/gnn/StemGNN.md StemGNN documentation.
code/gnn/StemGNN_Contagion.md StemGNN contagion risk module documentation.
code/gnn/MTGNN.md MTGNN usage documentation for regime graph construction.

Risk Engine documentation

File Purpose
code/riskEngine/README.md Risk Engine folder overview.
code/riskEngine/Volatility_Risk_Module.md Volatility model documentation.
code/riskEngine/Drawdown_Risk_Module.md Drawdown risk model documentation.
code/riskEngine/Regime_Detection_Module.md Regime detection module documentation.
code/riskEngine/Position_Sizing_Engine.md Position sizing engine documentation.
code/riskEngine/VaR_CVaR_Liquidity.md Historical VaR, CVaR, and liquidity documentation.

Fusion documentation

File Purpose
code/fusion/README.md Hybrid learned-fusion and rule-barrier documentation.

Data documentation

File Purpose
data/README.md Data folder overview.
data/FRED_data/README.md FRED macro/regime data documentation.
data/graphs/README.md Cross-asset graph data folder documentation.
data/sec_edgar/processed/DataProcessing.md SEC data processing methodology.
data/sec_edgar/processed/cleaned/README.md Cleaned SEC data documentation.
data/yFinance/yFinance.md Market data acquisition and processing documentation.

Research and configuration notes

File Purpose
researchPapers/Hyperparameter_Config.md Hyperparameter and HPO design references.
researchPapers/XAI_Specifications.md Earlier XAI specification source.
researchPapers/MASTER_PROMPT.md Project context reference.
researchPapers/MASTER_PROMPT2.md Updated project context reference.
researchPapers/WORKFLOW_v1.md Earlier workflow reference.
researchPapers/WORKFLOW_v2.md Updated workflow reference.

Data families

The framework is organised around four active data families.

1. Time-Series Market Data

Used by the Temporal Encoder, Technical Analyst, Volatility Risk Module, Drawdown Risk Module, VaR/CVaR calculations, Liquidity Risk Module, Position Sizing Engine, Quantitative Analyst, and Fusion.

Typical fields include:

Relevant documentation:

2. Financial Text Data

Used by FinBERT, Sentiment Analyst, News Analyst, and Qualitative Analyst.

Typical fields include:

Relevant documentation:

3. Macro / Regime Data

Used to support regime characterisation and market-state modelling.

Typical fields include:

Relevant documentation:

4. Cross-Asset Relation Data

Used for graph-based systemic and contagion risk modelling.

Typical fields include:

Relevant documentation:


Major module families

Encoders

Encoders transform raw or engineered data into dense representations that downstream modules can consume.

See code/encoders/README.md.

Analysts

Analyst modules convert encoder and model outputs into semantically meaningful intermediate decisions.

See code/analysts/README.md.

GNN modules

GNN modules provide graph-aware modelling for relation-driven financial risk.

See code/gnn/README.md.

Risk Engine

The Risk Engine is the central risk-control block of the framework. It produces the risk summaries and constraints used by position sizing, quantitative analysis, fusion, and final decisions.

It includes:

See code/riskEngine/README.md.

Fusion

Fusion integrates the qualitative and quantitative branches. It uses a learned fusion layer to estimate branch weights and decision signals, followed by a rule barrier that enforces user-defined risk constraints.

See code/fusion/README.md.


Fusion and final decision design

The Fusion Engine is intentionally hybrid.

Quantitative Analyst output
        │
        ├── learned branch weighting
        │
Qualitative Analyst output
        │
        ▼
Learned Fusion Layer
        │
        ▼
User Rule Barrier
        │
        ▼
Final Decision + Explanation

The learned layer estimates:

The rule barrier applies explicit constraints such as:

This design keeps the final system transparent: the model can learn how to combine evidence, but it cannot silently bypass the risk policy.


Explainability design

XAI is integrated across the system at three levels.

Module-level XAI

Each major model exposes local explanations relevant to its own modelling task. Examples include attention weights, gradient importance, top risk drivers, graph properties, graph edges, counterfactual summaries, and rule-trigger explanations.

Branch-level XAI

The Qualitative Analyst and Quantitative Analyst aggregate module outputs and preserve branch-level reasoning. The system records why a text signal or a risk driver dominates a branch-level output.

System-level XAI

Fusion combines branch explanations and adds final rule-barrier explanations. Final outputs include both the learned fusion rationale and any user-rule override rationale.

For the full XAI design, see xAI.md.


Setup

Environment setup, Git LFS, Python versioning, dependency installation, and smoke-test validation are documented in SETUP.md.

A typical setup flow is:

cd ~/fin-glassbox && python --version
cd ~/fin-glassbox && source venv3.12.7/bin/activate
cd ~/fin-glassbox && python -m py_compile code/encoders/temporal_encoder.py code/encoders/finbert_encoder.py
cd ~/fin-glassbox && python code/fusion/final_fusion.py smoke --repo-root . --device cuda

See SETUP.md for the complete setup and validation procedure.


Common command pattern

Most major model files follow this pattern:

cd ~/fin-glassbox && python path/to/module.py inspect --repo-root .
cd ~/fin-glassbox && python path/to/module.py smoke --repo-root . --device cuda
cd ~/fin-glassbox && python path/to/module.py hpo --repo-root . --chunk 1 --trials 30 --device cuda --fresh
cd ~/fin-glassbox && python path/to/module.py train-best --repo-root . --chunk 1 --device cuda --fresh
cd ~/fin-glassbox && python path/to/module.py predict --repo-root . --chunk 1 --split test --device cuda
cd ~/fin-glassbox && python path/to/module.py validate --repo-root . --chunk 1 --split test

Exact commands differ by module. Use the module-specific markdown files for the correct CLI interface.


Outputs and artefacts

Runtime outputs are organised under outputs/. Common output categories include:

outputs/
├── embeddings/
├── models/
├── results/
├── codeResults/
└── cache/

Typical artefacts include:

Large outputs should generally not be committed to normal Git history. Use .gitignore, Git LFS, or external storage where appropriate.


Research and engineering safeguards

The framework is designed around the following safeguards:

Chronological discipline

Financial data must be split and evaluated chronologically. Random time mixing can introduce leakage and invalid backtesting conclusions.

Point-in-time alignment

Ticker, date, filing date, macro date, and graph snapshot date must be aligned so that future information is not used for earlier predictions.

Train-only fitting

Normalisers, PCA transformations, and fitted preprocessing objects should be trained on training splits only, then applied to validation and test splits.

Schema validation

Downstream modules should fail loudly when expected columns are missing or stale schemas are detected.

XAI preservation

Prediction files should preserve enough explanation metadata to support module-level and final-system audit.

Risk-first fusion

The final Fusion Engine should not bypass position sizing, liquidity constraints, drawdown risk, contagion risk, or regime caps.


Team

Name GitHub Email
Ibrahim Hussain ib-hussain ibrahimbeaconarion@gmail.com
Lubabah Moten lubabahmoten-dev lubabahmoten@gmail.com
Sabeel Nadeem sabeelnadeem sabeelnadeem15@gmail.com

Licence

This repository is licensed under the GNU General Public License v3.0. See LICENSE.


Disclaimer

This project is an academic and research implementation. It is not intended for live trading, portfolio management, investment advice, or automated financial decision-making without independent validation, risk review, regulatory review, and human oversight.