Performance | BHF Capital

1. Headline numbers

0.826F2_dom AUC

±0.015Fold Std Dev

76.1%Stack Hit Rate

1.66Profit Factor

Past results do not guarantee future performance. The figures above are walk-forward validation on F2_dom and a 79-day backtest on the v133 stack with adaptive gate on, RTH only. Nothing on this page is a live account statement.

2. F2_dom walk-forward AUC

The microstructure head is the most heavily validated component of the stack. Across five purged folds with a 10-minute embargo, over 1.45M labelled samples drawn from the full MBP-10 history:

Fold	Samples	AUC	Notes
1 (earliest)	~290k	0.841	Highest spread vol in window
2	~290k	0.819	Quiet regime, feature importance shifts
3	~290k	0.832	Mixed regime
4	~290k	0.821	Event-heavy (CPI, NFP)
5 (latest)	~290k	0.817	Most recent, closest to live
Mean ± SD	~1.45M	0.826 ± 0.015	Purged K=5, 10-min embargo

Reading this honestly: fold 5 is the most recent and the closest to live. It is also the lowest AUC in the set. We treat that as the realistic upper bound for deployed performance, not the mean.

3. Full-stack backtest — adaptive gate vs no gate (v133)

The full believe stack (tick ML + XGB 5m + F2_dom) is run against the 79-day live tick capture from 2026-01-29 through 2026-04-17, RTH only, with commission, marketable-limit entry slippage rules, and random 1-2 tick stop slippage applied. Same tape, same models, same order logic — the only difference between rows is whether the v133 adaptive regime gate is on or off.

Configuration	Trades	Hit Rate	Profit Factor	Net (BT $)	Max DD (BT $)
No gate (v131 behaviour)	9,361	75.9%	1.63	+131,198	11,372
Adaptive gate on (v133)	9,136	76.1%	1.66	+134,164	7,938
Delta	−225	+0.2 pt	+0.03	+$2,966	−$3,434 (−30.2%)

Reading this table. The adaptive gate suppresses ~2.4% of trades but does so in the exact regimes where the stack was losing money. Net P&L goes up, peak drawdown falls ~30%, and the equity curve is visibly cleaner through the two worst weeks of the window. Numbers are backtest dollars on a 1-lot ES stack, not live account P&L.

3b. F2_dom ablation (retained from v131)

For context, the historical F2_dom ablation measured on an earlier 78-day tape. The microstructure head remains the largest single driver of the stack’s edge.

Configuration	Hit Rate	Profit Factor	Result
Stack, F2_dom disabled	67.7%	1.02	Near-breakeven pre-commission
Stack, F2_dom enabled	70.6%	1.35	Walk-forward consistent

The lift from the F2_dom head is real, small, and consistent across folds. This is what a microstructure signal is supposed to do: tilt the edge, not replace it.

4. Triple-barrier label distribution

Before trusting any classification metric, you should see the label distribution. Below is the distribution of the three barrier outcomes across the 1.45M F2_dom training samples:

Outcome	Share	Interpretation
Upper barrier touched (+12 ticks)	~41%	Take-profit realised
Lower barrier touched (-8 ticks)	~44%	Stop-loss realised
Vertical barrier (time-out)	~15%	Neither touched; exit at expiry

The label set is close to balanced and not dominated by time-outs, which is a precondition for the AUC number above to be meaningful.

5. Feature importance (F2_dom v133)

Importance is gain-based, averaged over the five purged folds. We track the top 16 each retrain and alert on large rank shifts. The table below is the v133 snapshot; absolute gain values are withheld because they are retrain-specific and not decision-useful off the training host.

Rank	Feature	Family
1	`book_imb`	Aggregate imbalance
2	`tob_ratio`	Top of book
3	`top3_imb`	Near-touch imbalance
4	`mid_mom`	Microprice drift
5	`imb_std`	Rolling imbalance vol
6	`bid_grad_2`	Bid gradient L2
7	`ask_grad_2`	Ask gradient L2
8	`spread_ticks`	Spread
9	`depth_ratio`	Depth skew
10	`queue_age`	Price-level staleness

6. What we deliberately do not publish

Daily P&L. A marketing site is not a trade blotter. Daily dollar moves say nothing useful about long-run edge and everything useless about noise.
Cumulative equity curves for the live account. Same reason. They invite cherry-picking start dates.
Sharpe ratios on short windows. Sharpe on < 1 year of intraday trading is mostly a random variable.
Single-fold “best” metrics. If we only showed fold 1 we could claim 0.841. We show the mean and the standard deviation.

7. What we monitor day to day

Feature drift. Any core feature’s importance rank shifting by more than two positions triggers a retrain review.
Bracket fill rate. Measured as the share of entries that receive a marketable-limit fill. A decline in fill rate is read as a market-regime shift, not as a broker issue, until both are checked.
Walk-forward AUC drift. Each weekly retrain produces a new fold-5 AUC. We care about the slope, not the level.
Backtest vs live parity. Every live session is replayed the next day through the backtester against the captured tick stream. A trade that should have fired but did not is treated with the same seriousness as a trade that fired but should not have.

Nothing on this page is an offer, solicitation, or investment advice. Past walk-forward and backtest results do not guarantee future live performance. Commission, exchange fees, slippage, and regime changes can materially affect results. BHF Capital is an informational brand of Rare Bird Holdings LLC.