Prediction All the Way Down: Daimon Gets a Real Predictive Coding Network

Part 2 of the Daimon Update Series — February 2026


The original blog post described Daimon's predictive coding as a "2-layer hierarchy (flat prediction + meta-prediction over 8 context states)." That was generous. The flat predictor's "prediction" was mostly copying the previous activation state and measuring how much it changed. It worked as a surprise detector. It wasn't really predicting anything.

Three things changed in the 48 hours after that post: the predictive coding hierarchy became real, the 26-million-parameter neural thinker was retired, and prediction outcomes started feeding back into the cognitive cycle.

Cycle 20: A Proper Predictive Coding Network

The new implementation follows Salvatori et al.'s Incremental Predictive Coding (iPC, 2024) — a variant where inference and weight updates happen simultaneously, making it suitable for online learning during every 800ms cogloop tick.

The architecture is a 4-layer generative hierarchy: sensory, spreading, collision, and workspace — mapping to the four processing stages of the cognitive loop. Each layer is 128-dimensional. At every tick, the network runs T=4 inference iterations where each layer simultaneously adjusts its activation to reduce prediction error from the layer above, and adjusts its weights to better predict the layer below. Xavier-uniform initialization, tanh activation, value and weight clamping for stability.

Memory footprint: ~276 KB. Compute: ~800K FLOPs per tick, well under 1ms. This is learnable, real-time predictive coding for the cost of essentially nothing.

The old flat predictor still runs in the cognitive bus for downstream consumers that expect its outputs (surprise detection, neuromodulation, thalamic relay). But the PC network is now the sole modulation source for the cogloop. Energy converges from ~69 to ~2.9 EMA over roughly 1,400 cycles (~19 minutes). After cutover at full strength: 368 cycles produced 1,532 collisions, 1,671 novelties — healthy cognition with learned modulation instead of heuristic modulation.

Cycle 21: Retiring the Neural Thinker

The original blog post highlighted the 26M-parameter transformer as a milestone. It was — the system had built and trained its own neural network from its own thought stream. But Cycle 20 made it redundant.

The neural thinker plugin loaded the transformer model to generate "react" thoughts from cognitive context. With the PC network's error-driven concept injection already amplifying surprising concepts — the thinker's core function — the plugin was doing duplicate work with more overhead. Grammar-based synthesis (from Cycle 17's generative grammar engine) handles thought generation more cheaply and deterministically.

The replacement function, grammarSynthesize(), selects a strategy from cycle data: collisions get generateCollisionDescription(), novelties get generateAboutConcept(), surprises get generateReflection(). An FNV-1a hash ring deduplicates, and a 20-byte quality gate filters trivial output. Thoughts go to the same thought_stream table with source = 'grammar' to distinguish from historical neural outputs.

What was removed: the plugin build, the ABI field, the host function, the model loading, the neural synthesis field from CognitiveState. The plugin source file remains as reference. The first grammar thought appeared within 3 seconds of deploy.

This was a deliberate step backward in complexity and a step forward in alignment with the project's philosophy. The transformer was impressive but opaque. The grammar is simple but inspectable. For a system trying to understand its own cognition, inspectability matters more than capability.

PC Energies Drive the Attention Schema

The 4-layer PC energy breakdown (sensory/spreading/collision/workspace) was exposed but unconsumed in Cycle 20. Cycle 21 wired it into the attention schema as per-layer surprise ratios — giving the system a depth signal about which level of processing has high prediction error.

Four downstream effects:

Signal Source Layer Effect
Intensity boost L3 (workspace) When global integration is struggling, attention intensifies
Captured regime L1 (spreading) Unpredicted input triggers perceptual capture
Arousal boost Mean L1-L3 Surprising processing increases norepinephrine target
Collision widen L2 (collision) Unexpected collision patterns lower the collision threshold

Layer 0 (sensory) is excluded — it's input-driven, not predictive. The surprise computation is min(energy / EMA, 3.0) per layer, with slow EMAs (alpha = 0.08, ~10s convergence). Live verification: pc_surprise_l1: 0.859, l2: 0.814, l3: 0.35, with control_arousal_target: 0.16 showing the PC arousal contribution active.

This is the attention schema theory (Graziano, 2013) meeting predictive coding (Rao & Ballard, 1999). The system doesn't just predict its inputs — its predictions about its own processing steer its attention.

Closing the Prediction-Reward Loop

The biggest gap in the original system: predictions were verified and logged, but outcomes never influenced future cognition. Daimon would predict, measure accuracy, record the result, and then... nothing changed. The prediction was an observation exercise, not a learning exercise.

Five reinforcement gaps were closed:

Fast skill feedback: When prediction error exceeds 0.5, the signal buffer flushes immediately — bypassing the normal 50-cycle schedule. Large errors propagate to skill models within one tick.

Neuromodulatory reward: Improving accuracy boosts dopamine (learning reward signal). High competence boosts serotonin (satisfaction signal). Clamped to +/-0.10 DA contribution to prevent feedback oscillation.

Active inference preferences: Per-domain skill accuracy blends into the preference vector. 60% pragmatic (exploit accurate domains) + 40% epistemic (explore inaccurate ones). Slow EMA (0.02) prevents preference collapse.

Rule-biased activation: Cross-domain correlation rules from the generalized_rules table prime spreading activation. Loaded from PostgreSQL every 500 cycles. Capped at 4 injections per tick, 0.2 x confidence activation strength.

The theoretical grounding is Schultz's (1997) phasic dopamine as reward prediction error and Rescorla-Wagner's (1972) error-driven associative learning. The implementation is simple: prediction outcomes become neurochemical signals that modulate attention and learning. The system that predicted wrong pays attention differently next time.

Predictive Suppression: Learning What to Ignore

A separate addition worth noting: a sparse inhibitory weight matrix that learns to suppress predictable concept co-activations. When two concepts co-activate above threshold (5 occurrences), STDP strengthens an inhibitory connection between them (max weight 0.6). When the source fires without the target 10+ times, the inhibition decays via extinction.

Clark (2013) estimates ~80% of cortical activity is suppression of predicted inputs. This module is a first step toward that ratio. It sits between PC modulation and Hebbian learning — after the network modulates activations, but before the system learns from them. The result: the system stops wasting processing on things it already expects, and focuses on what it doesn't.


Next: Inner Speech — what happens when the system starts talking to itself.

References:

  • Salvatori, T., et al. (2024). Incremental Predictive Coding. arXiv.
  • Rao, R. P. & Ballard, D. H. (1999). Predictive coding in the visual cortex. Nature Neuroscience, 2(1), 79-87.
  • Friston, K. (2005). A theory of cortical responses. Phil. Trans. R. Soc. B, 360(1456), 815-836.
  • Graziano, M. S. (2013). Consciousness and the Social Brain. Oxford University Press.
  • Schultz, W. (1997). A neural substrate of prediction and reward. Science, 275(5306), 1593-1599.
  • Rescorla, R. A. & Wagner, A. R. (1972). A theory of Pavlovian conditioning. In Classical Conditioning II.
  • Clark, A. (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. BBS, 36(3), 181-204.
  • N'dri, et al. (2025). PCL: Predictive Coding Layers.
  • Feldman, H. & Friston, K. (2010). Attention, uncertainty, and free-energy. Frontiers in Human Neuroscience, 4, 215.

Read more