Deep SIML Labs

TL;DR - We added a one-bit surprise gate (FEPGate) from our SIML cognitive sidecar to the frozen V-JEPA 2 planning loop.
With zero retraining and the same CEM/MPC budget, that tiny hook reshapes the latent energy surface and delivers big wins:

Final error: 0.193 m → 0.109 m (~50% ↓)

Monotonicity: 0.12 → 0.43 (>3× ↑)

Per-step latency: 2.28 s → 1.13 s (≈2× faster)

Energy/episode: 0.058 Wh → 0.029 Wh (≈2× ↓)

EDP (energy × delay): ~4× lower

Why this matters

Even world-class latent planners waste compute when the environment isn’t surprising, and they can get stuck spiraling around spurious minima when it is. A single cognitive bit fixes both:

Spend compute only when the world deviates from prediction (surprise ↑).
Skip expensive re-plans when everything tracks (surprise ↓).

This is the smallest practical step toward a cognitive OS layer that governs any latent system - not just V-JEPA 2 - with a universal, model-agnostic feedback signal.

What we built

B1 - Grip-bit
Exposes the gripper open/close as a single bit to both sampling and scoring (cleaner grasp/place phases). (Ablation left to appendix/code.)
B2 - FEPGate (the star)
A SIML sidecar computes a normalized surprise score s(a; z_k) on CEM/MPC elites at latent state z_k:
- If s > τ → reject/penalize those elites (reshape the elite set).
- If s ≤ τ → leave them alone (no unnecessary re-plans).
Zero retraining
V-JEPA 2 encoder/predictor are frozen throughout.
Same budget
Horizon, samples, and iterations unchanged; we only add the bit-level hook.
τ-calibration (obs-driven)
One short warm-up per scene: sample the L1-ball at z_k, measure surprise, set τ at the 5–10th percentile (empirically ~1e-4 … 1e-3).
Adaptive memory (bonus)
SIML schemas expand with novelty spikes and compress when surprise stays low - keeping entropy and compute in check without touching V-JEPA’s weights.

How it works (30-second tour)

Predict. V-JEPA 2-AC rolls out candidate futures ĥ z_{t+1} from z_t.
Gate elites (pre-exec). SIML computes surprise on elite candidates; high-surprise ones are gated out before execution.
Act. Execute the best action a_t.
Update (post-exec). Encode the new obs → z_{t+1}; SIML updates surprise/memory.

This one-bit feedback carves away bad pockets in the search space and preserves curvature where dynamics are feasible.

Results

Setup. 100 episodes of single-goal reaching (6–7 steps/ep), 7-DoF deltas sampled in an L1-ball (r=0.075). NVML power at 1 Hz. Single NVIDIA GPU; sidecar on CPU. No fine-tuning. Budget < $100.

Headline wins (FEPGate vs OFF):

Error: 0.193 m → 0.109 m
Smooth progress: 0.12 → 0.43
Latency: 2.28 s → 1.13 s
Energy/ep: 0.058 Wh → 0.029 Wh
EDP: ~4× lower (product of means)
EDP (Energy×Latency): OFF ≈ 0.132, FEP ≈ 0.033 → ~4× lower.

Final error distributions — Figure 1 - Lower error, ~2× faster decisions, and ~2× lower energy with FEPGate.

Per-step latency — Figure 1 - Lower error, ~2× faster decisions, and ~2× lower energy with FEPGate.

What it feels like: direct trajectories when the model is right; immediate course-corrections the instant reality diverges. No wasted search when nothing changed.

Implementation notes

IPC: tiny CLI (surprise, step) exchanging .npy buffers + a small JSON.
Placement: gate only the elite set each CEM/MPC iteration (orders of magnitude fewer calls; same effect).
Threshold: τ from an observation-driven warmup; typical values in our runs land around 1e-4 … 1e-3.
Memory: runtime schema expand/contract policy keeps useful novelty while pruning redundancy.

About the Author

This research was conducted by Deep SIML Labs, an independent research lab exploring the frontiers of artificial life, cognition, and self-organizing intelligence.