Feature Evidence vs. Backtest Evidence
Understand why a promising feature is not the same thing as a net-positive strategy after costs replay.
A strong feature and a strong backtest are related, but they are not interchangeable.
The short version
- Feature evidence asks: does this feature carry information associated with future returns (empirical)?
- Backtest evidence asks: if we turn that idea into rules, sizing, and costs, does it still survive as a strategy?
That distinction matters because many weak products collapse both into a single story.
What feature evidence is trying to prove
Feature evidence is about whether the underlying idea carries information before execution rules are layered on top.
At its simplest, the evidence layer is asking:
IC = corr(signal_t, return_t+h)If that correlation is positive and stable out of sample, the feature may contain real empirical forward-association.
Examples:
- out-of-sample correlation between the feature and future return
- fold stability across validation windows
- whether the feature is degrading over time
- whether the feature still behaves acceptably across regimes
In the current workspace surface, the customer-facing evidence block focuses on:
OOS ICIC IRPositive folds- decay or warning context
You can think about the stability part as:
IC IR = mean(IC across folds) / stddev(IC across folds)That is why a feature can have a positive average IC but still fail to inspire confidence if its fold-to-fold behavior is too erratic.
What a backtest is trying to prove
A backtest is about implementation quality, not just raw empirical forward-association.
It asks what happens when the system has to trade the idea with:
- entries and exits
- position sizing
- execution cost assumptions
- replayed market conditions
- a specific historical window
In simplified form, a backtest is asking something closer to:
Strategy PnL ~= gross excess - fees - slippage - implementation dragThat is why a backtest produces strategy outputs such as:
- PnL
- drawdown
- trade count
- period behavior
Why a feature can look good while a backtest looks weak
This is a normal and important failure mode.
Common reasons include:
- the feature is real, but too weak after costs
- the implementation trades too often
- the sizing is too aggressive
- the strategy only works in one regime and collapses elsewhere
- the execution assumptions are too optimistic
Why a backtest can look good while the feature evidence is weak
This is also possible, and it is dangerous.
Common reasons include:
- too much parameter freedom
- lucky period selection
- overfitting to one market phase
- a strategy rule set that looks net-positive after costs without strong underlying evidence
A good-looking backtest without strong evidence should be treated as fragile. A strong feature without a convincing backtest should be treated as incomplete.
How Statly separates the two
The product should keep these layers honest:
| Layer | Main question |
|---|---|
| Feature evidence | Does the candidate appear to carry information associated with future returns (empirical)? |
| Backtest | Does the excess vs benchmark implementation survive historical replay? |
| Paper | Does the idea still behave coherently under current market conditions? |
| Live | Does the deployment remain trustworthy when real capital is exposed? |
The research surface should never pretend that one layer fully replaces the others.