fin-glassbox

code/analysts/ Folder Documentation

1. Folder purpose

The code/analysts/ folder contains the specialist analysis layer for the fin-glassbox project:

An Explainable Multimodal Neural Framework for Financial Risk Management

The analysts convert encoder outputs and risk-engine outputs into interpretable module-level and branch-level signals. They are not final trade approvers. They exist to preserve the project philosophy:

specialisation + modularity + explainability + risk-aware synthesis

2. Folder files and references

File Documentation Role
text_market_label_builder.py Section 6 of this document Builds supervised market-derived labels for text analysts
sentiment_analyst.py SentimentAnalyst.md Learns market-aligned sentiment from FinBERT embeddings
news_analyst.py NewsAnalyst.md Learns document/event impact, importance, and risk relevance
technical_analyst.py TechnicalAnalyst.md Learns trend, momentum, and timing confidence from Temporal Encoder embeddings
qualitative_analyst.py QualitativeAnalyst.md Combines sentiment and news outputs into qualitative branch outputs
quantitative_analyst.py QuantitativeAnalyst.md Learns risk-attention pooling over quantitative risk/position outputs

3. Architectural placement

ENCODERS
├── FinBERT
└── Shared Temporal Attention Encoder
        ↓
ANALYSTS
├── Sentiment Analyst
├── News Analyst
├── Technical Analyst
├── Qualitative Analyst
└── Quantitative Analyst
        ↓
FUSION ENGINE
        ↓
FINAL TRADE APPROVER

The folder contains both first-level specialist analysts and branch-level synthesis analysts.


4. Analyst categories

4.1 Textual/qualitative path

text_market_label_builder.py
sentiment_analyst.py
news_analyst.py
qualitative_analyst.py

This path turns SEC text and FinBERT embeddings into daily qualitative branch outputs.

4.2 Technical/market path

technical_analyst.py

This path turns Temporal Encoder embeddings into technical scores.

4.3 Quantitative synthesis path

quantitative_analyst.py

This path consumes Position Sizing output and learns attention-weighted risk synthesis.


5. End-to-end analyst data flow

5.1 Qualitative branch

SEC textual filings
        ↓
FinBERT embeddings
        ↓
text_market_label_builder.py
        ↓
Sentiment Analyst + News Analyst
        ↓
Qualitative Analyst
        ↓
Fusion Engine

5.2 Quantitative branch

Market data
        ↓
Temporal Encoder embeddings
        ↓
Technical Analyst
        ↓
Risk Engine modules
        ↓
Position Sizing Engine
        ↓
Quantitative Analyst
        ↓
Fusion Engine

6. text_market_label_builder.py

6.1 Purpose

This file builds real supervised labels for the text analysts. It joins SEC text metadata with CIK/ticker mapping and market returns.

It creates labels such as:

sentiment_score_target
sentiment_class_target
news_event_impact_target
news_importance_target
risk_relevance_target
volatility_spike_{risk_horizon}d_target
drawdown_risk_{risk_horizon}d_target

6.2 Design rules

Inputs are SEC metadata available at filing time.
Targets are future market outcomes after the filing date.
The event start is the first trading day strictly after filing_date.
Train-only thresholds are fitted per chronological chunk.
Validation/test thresholds are inherited from the training split.
No dummy labels are created.

6.3 Default horizons

horizons = 1,5,10,20,30
primary_sentiment_horizon = 10
primary_news_horizon = 10
risk_horizon = 30

6.4 Why it matters

This file makes the text analysts thesis-defensible because the supervision comes from future market outcomes rather than subjective sentiment labels.


7. Sentiment Analyst

Documentation: SentimentAnalyst.md

Primary input:

outputs/embeddings/FinBERT/chunk{N}_{split}_embeddings.npy
outputs/results/analysts/labels/text_market_labels_chunk{N}_{split}.csv

Primary output:

outputs/results/analysts/sentiment/chunk{N}_{split}_predictions.csv
outputs/embeddings/analysts/sentiment/chunk{N}_{split}_sentiment_embeddings.npy

Core role:

FinBERT embedding → market-aligned sentiment score, class, confidence, uncertainty, magnitude

8. News Analyst

Documentation: NewsAnalyst.md

Primary input:

outputs/embeddings/FinBERT/chunk{N}_{split}_embeddings.npy
outputs/results/analysts/labels/text_market_labels_chunk{N}_{split}.csv

Primary output:

outputs/results/analysts/news/chunk{N}_{split}_news_predictions.csv
outputs/results/analysts/news/chunk{N}_{split}_attention.csv
outputs/embeddings/analysts/news/chunk{N}_{split}_news_embeddings.npy

Core role:

document chunks → event impact, importance, risk relevance, volatility spike, drawdown risk

9. Technical Analyst

Documentation: TechnicalAnalyst.md

Primary input:

outputs/embeddings/TemporalEncoder/chunk{N}_{split}_embeddings.npy
outputs/embeddings/TemporalEncoder/chunk{N}_{split}_manifest.csv

Primary output:

outputs/results/TechnicalAnalyst/predictions_chunk{N}_{split}.csv
outputs/results/TechnicalAnalyst/xai/

Core role:

Temporal Encoder sequence → trend_score, momentum_score, timing_confidence

10. Qualitative Analyst

Documentation: QualitativeAnalyst.md

Primary input:

outputs/results/analysts/sentiment/chunk{N}_{split}_predictions.csv
outputs/results/analysts/news/chunk{N}_{split}_news_predictions.csv

Primary output:

outputs/results/QualitativeAnalyst/qualitative_events_chunk{N}_{split}.csv
outputs/results/QualitativeAnalyst/qualitative_daily_chunk{N}_{split}.csv
outputs/results/QualitativeAnalyst/xai/

Core role:

Sentiment + News → daily qualitative score, qualitative risk, qualitative confidence

11. Quantitative Analyst

Documentation: QuantitativeAnalyst.md

Primary input:

outputs/results/PositionSizing/position_sizing_chunk{N}_{split}.csv

Primary output:

outputs/results/QuantitativeAnalyst/quantitative_analysis_chunk{N}_{split}.csv
outputs/results/QuantitativeAnalyst/xai/quantitative_analysis_chunk{N}_{split}_xai_summary.json

Core role:

Risk scores + technical context + position sizing → risk-attention quantitative branch output

Fusion requires the trained attention schema:

top_attention_risk_driver
risk_attention_volatility
risk_attention_drawdown
risk_attention_var_cvar
risk_attention_contagion
risk_attention_liquidity
risk_attention_regime
attention_pooled_risk_score

12. Shared implementation standards

All analyst files should follow these standards:

Importable module + executable CLI.
Chronological chunks only.
No dummy data for real training.
HPO with Optuna/TPE where applicable.
Checkpointed training.
Prediction outputs preserve ticker/date/document provenance.
XAI outputs are either embedded in prediction rows or saved as sidecar JSON/CSV files.
CUDA and CPU execution are supported.
Commands should be written as single lines.
CSV/NPY/JSON/PT are the primary output formats.

13. Dependency order

Recommended order after upstream embeddings exist:

1. text_market_label_builder.py
2. sentiment_analyst.py
3. news_analyst.py
4. qualitative_analyst.py
5. technical_analyst.py
6. riskEngine/position_sizing.py
7. quantitative_analyst.py
8. fusion layer

The Technical Analyst can run independently of the text-side modules as long as Temporal Encoder embeddings exist.


14. Full rerun commands

Compile:

python -m py_compile code/analysts/text_market_label_builder.py code/analysts/sentiment_analyst.py code/analysts/news_analyst.py code/analysts/technical_analyst.py code/analysts/qualitative_analyst.py code/analysts/quantitative_analyst.py

Sentiment full rerun:

python code/analysts/sentiment_analyst.py hpo-all --repo-root . --chunks 1,2,3 --trials 30 --device cuda && python code/analysts/sentiment_analyst.py train-best-all --repo-root . --chunks 1,2,3 --device cuda && python code/analysts/sentiment_analyst.py predict-all --repo-root . --chunks 1,2,3 --splits train,val,test --device cuda

News full rerun:

python code/analysts/news_analyst.py hpo-all --repo-root . --chunks 1,2,3 --trials 30 --device cuda && python code/analysts/news_analyst.py train-best-all --repo-root . --chunks 1,2,3 --device cuda && python code/analysts/news_analyst.py predict-all --repo-root . --chunks 1,2,3 --splits train,val,test --device cuda

Technical chunk rerun:

python code/analysts/technical_analyst.py hpo --repo-root . --chunk 1 --trials 40 --device cuda --fresh && python code/analysts/technical_analyst.py train-best --repo-root . --chunk 1 --device cuda --fresh && python code/analysts/technical_analyst.py predict --repo-root . --chunk 1 --split train --device cuda && python code/analysts/technical_analyst.py predict --repo-root . --chunk 1 --split val --device cuda && python code/analysts/technical_analyst.py predict --repo-root . --chunk 1 --split test --device cuda

Qualitative chunk rerun:

python code/analysts/qualitative_analyst.py hpo --repo-root . --chunk 1 --trials 30 --device cuda --fresh && python code/analysts/qualitative_analyst.py train-best --repo-root . --chunk 1 --device cuda --fresh && python code/analysts/qualitative_analyst.py predict-all --repo-root . --chunks 1 --splits train val test --device cuda

Quantitative chunk rerun:

python code/analysts/quantitative_analyst.py hpo --repo-root . --chunk 1 --trials 30 --device cuda --fresh && python code/analysts/quantitative_analyst.py train-best --repo-root . --chunk 1 --device cuda --fresh && python code/analysts/quantitative_analyst.py predict-all --repo-root . --chunks 1 --splits train val test --device cuda

15. Validation commands

Check Quantitative Analyst schema before Fusion:

python - <<'PY'
import pandas as pd
from pathlib import Path
for p in sorted(Path('outputs/results/QuantitativeAnalyst').glob('quantitative_analysis_chunk*_*.csv')):
    cols = pd.read_csv(p, nrows=0).columns
    print(p, 'attention_schema=', 'top_attention_risk_driver' in cols, 'old_schema=', 'top_risk_driver' in cols and 'top_attention_risk_driver' not in cols)
PY

Check Qualitative daily outputs:

python - <<'PY'
from pathlib import Path
for c in [1,2,3]:
    for s in ['train','val','test']:
        p = Path(f'outputs/results/QualitativeAnalyst/qualitative_daily_chunk{c}_{s}.csv')
        print(f'chunk{c}_{s}:', 'OK' if p.exists() else 'MISSING', p)
PY

16. Integration risk notes

Do not mix old and new Quantitative Analyst outputs. Fusion must consume the trained attention schema.

If FinBERT is regenerated, rerun:

Sentiment Analyst
News Analyst
Qualitative Analyst
Fusion Engine

If Temporal Encoder embeddings are regenerated, rerun:

Technical Analyst
Position Sizing
Quantitative Analyst
Fusion Engine

Fusion training requires train outputs from both branch-level analysts:

outputs/results/QuantitativeAnalyst/quantitative_analysis_chunk{N}_train.csv
outputs/results/QualitativeAnalyst/qualitative_daily_chunk{N}_train.csv

The code/analysts/ folder implements the specialist reasoning layer of the project. It converts text embeddings, temporal embeddings, and risk-engine outputs into explainable branch signals. The folder supports the project’s central claim that a distributed, specialised, explainable architecture is more defensible than a single monolithic black-box predictor.