Methodology And Risk
Review the high-level methodology, data framing, and warning philosophy behind the research surface.
This page explains the trust framework behind the research section without exposing internal runbooks or operational thresholds.
Methodology
Statly research is designed around a simple rule: evidence should reflect what could have been known at the time, not what became obvious later.
That is why the product speaks in terms of:
- Point-in-time discipline
- No-lookahead reasoning
- Validation rather than single-run storytelling
Data provenance and hygiene
The public claim should be conservative:
- research uses explicit data families rather than vague "black-box AI outputs"
- candidate evidence is tied to known market inputs
- point-in-time discipline matters more than headline win rate
- provenance and recency are part of the trust framework, not afterthoughts
What we can say publicly today:
- some lanes are vendor-backed and some are natively collected
- manifests, runs, and candidate states are tracked as first-class objects
- the product takes data validation seriously enough to surface stale evidence and warning posture instead of pretending every candidate is equally fresh
What we should not over-claim publicly yet:
- lane-by-lane outlier policy details for every dataset
- a fully published missing-data policy for every exchange feed
- municipality-grade or vendor-grade operational detail that belongs in internal runbooks
For the public-facing provenance framing, see Data Provenance And Hygiene.
Data families
The research system works with the following data families:
| Family | Description |
|---|---|
| Trades | Executed trade data |
| Best bid / offer | Order book top-of-book |
| Order book depth (L2) | Multi-level depth, imbalance, and order book pressure |
| Funding | Perpetual funding rates |
| Open interest | Market-wide open positions |
| Mark price | Exchange mark price |
| Index price | Cross-venue index |
| Liquidations | Forced liquidation events |
| Oracle price | External price feed data |
Some data lanes rely on vendor-backed coverage while others are built on native collection. The customer-facing point is that the research system treats data provenance and validation as part of the trust framework.
Statistical rigor
The research docs should be explicit that screening is not allowed to quietly reward noise.
At the current boundary, the screening engine already uses:
- fold-based out-of-sample evidence
- a multiple-testing correction path
- a concrete selection rule rather than hand-wavy ranking language
Today, the public code-backed correction in the feature-screen path is:
bonferroni_ic_pvalue.v1
This means Statly is already taking the multiple-comparison problem seriously instead of pretending that the best-looking feature automatically deserves trust.
At a high level, that correction can be summarized as:
adjusted_p_value = min(1, raw_p_value * number_of_tests)That formula matters because it makes it harder for a feature to survive screening just because many alternatives were tried.
Biases the docs should name openly
If the docs want to build trust, they should say these words directly:
- look-ahead bias
- overfitting
- multiple testing
- stale evidence
- execution-cost optimism
The product becomes more credible when it shows that a favorable backtest or a strong-looking feature can still be rejected for honest reasons.
Backtest realism
A backtest is only useful if it is trying not to lie.
That means the public docs should keep reinforcing that replay quality depends on:
- holdout review
- paper shadow or paper observation
- cost validation
- slippage assumptions
- implementation drift and latency awareness
The current stack already contains code and tests for execution cost, slippage, latency-aware stress handling, and depth-aware market structure. The customer-facing docs do not need every parameter, but they should clearly state that PnL is not treated as a frictionless fantasy number.
At the conceptual level, the backtest engine is always trying to defend against a fake equation like:
naive_pnl = gross_feature_excessand replace it with something closer to:
realistic_pnl = gross_excess - fees - slippage - latency / implementation dragWhy warnings exist
Warnings exist because the system should not lie about what is missing.
If a candidate lacks:
- Recent backtest evidence
- Recent paper observation
- Stronger institutional promotion for live
...the product says that clearly and suggests the safer next step.
A warning is not a hidden block. It is an honest statement about the current evidence gap. The operator still has agency to choose their next action within the workspace.
Where features fail
The docs should also teach where good-looking candidates can still break:
- thin-liquidity windows
- regime transitions
- cost sensitivity
- live-vs-backtest decay
- partial evidence where backtest exists but paper confirmation is still weak
This is not negative marketing. It is how the product earns trust.