Learning What Will Happen Next: Predictive Coding in Hyperspace
Part 10 of the Daimon Update Series — March 2026
Post 2 described Daimon's first predictive coding network: a 4-layer float-domain hierarchy running inference iterations every 800ms tick. It worked. It learned to predict activation patterns across sensory, spreading, collision, and workspace layers, and the prediction errors fed back into the cognitive cycle as surprise signals.
But it was built on a foreign substrate. The predictive coding network used 128-dimensional float vectors, Xavier initialization, tanh activation — standard neural network machinery. Everything else in Daimon runs on 10,000-bit binary vectors with Hamming distance and Hebbian learning. The PC network was a float island in a binary sea, requiring constant translation between representations.
The new hierarchical predictive coding system eliminates the translation. Prediction, error, and learning are all native HDM operations. Prediction error is XOR. Learning is Hebbian bit-flips. The generative model is HDM spreading activation — the same learned associations that drive every other cognitive process. The entire predictive hierarchy runs on the same algebra as perception, memory, and action.
The Tier Hierarchy
The system implements four temporal tiers, following Kiebel, Daunizeau & Friston's (2008) principle that predictive coding hierarchies should operate at different timescales — fast at the bottom (sensory), slow at the top (narrative):
Tier 0 — Sensory (every tick, ~800ms): The ground floor. State is clamped to the current activation pattern bundled via majority vote from the top-16 active concepts. This tier never generates predictions downward — it's the sensory evidence that anchors the entire hierarchy.
Tier 1 — Associative (every tick): Predicts the current sensory state from learned associations. Updates every tick. This tier captures fast statistical regularities — concepts that reliably co-occur within a few seconds of each other.
Tier 2 — Episodic (every 5 ticks, ~4 seconds): Predicts the associative state at a slower timescale. Captures regularities that unfold over seconds — the kind of temporal structure that episodic memory tracks.
Tier 3 — Narrative (every 25 ticks, ~20 seconds): Predicts the episodic state at the slowest timescale. Captures the broad cognitive context — what Daimon has been thinking about over the last half-minute. This tier's prediction errors are the most informative: they signal genuine shifts in cognitive context rather than moment-to-moment fluctuation.
Each tier maintains a state vector (a ConceptVector), a prediction of the tier below, an error vector, an error magnitude (EMA-smoothed), and a correction vector that absorbs systematic prediction residuals.
Native Binary Arithmetic
The elegance — and the point — is that every operation uses HDM primitives:
Prediction error = XOR. The difference between what was predicted and what actually happened is computed as the bitwise XOR of two ConceptVectors. The number of differing bits (popcount) gives the error magnitude. This is the same Hamming distance computation that drives similarity search, nearest-neighbor retrieval, and every other comparison in the system. Error and similarity are the same measurement on different scales.
Learning = Hebbian nudging. When the error at a tier exceeds a threshold, correction learning kicks in: probabilistic bit-flips that nudge the correction vector toward reducing the systematic component of the error. The flip probability is proportional to the learning rate, gated by dopamine (three-factor Hebbian rule). More dopamine, faster learning — the system learns more aggressively from events that its own reward system marks as significant. Each flip is capped at a maximum probability to prevent catastrophic overwriting.
Generative model = spreading activation. The top-down prediction at each tier is generated by finding the k-nearest HDM concepts to the current state, then bundling them into a predicted vector with the correction vector applied. This IS HDM spreading activation — the same process that drives concept retrieval, association, and inference everywhere in the architecture. The generative model doesn't need separate weights. It uses the weights that Hebbian learning has been accumulating since the system started.
Total memory: ~25 KB. Four correction vectors at ~1.2 KB each, plus per-tier state and prediction vectors. Compare to the float-domain PC network's 192 KB. The binary system is 8x smaller and uses no floating-point arithmetic.
Inference Settling
The initial implementation was one-shot: predict, compute error, learn, move on. This is fast but suboptimal. Millidge, Seth & Buckley (2021) showed that prediction errors only approximate backpropagation gradients at equilibrium — when the prediction-error-correction cycle has converged. A single pass gives a rough approximation. Multiple passes give better learning signals.
The settling loop runs up to 5 iterations per tick. Each iteration:
- Each tier generates a top-down prediction of the tier below
- Error is computed between prediction and actual state
- Lower tier states are nudged toward the prediction (symmetric propagation)
- Check convergence: if the total error delta across all tiers drops below a threshold (0.002), stop early
The state nudging is the critical addition. In the one-shot version, information flows only upward (error) and learning only affects correction vectors. With settling, information flows bidirectionally: top-down predictions nudge lower tier states, and those nudged states produce different bottom-up signals to higher tiers. The dynamics converge to an equilibrium where predictions and states are maximally consistent across the entire hierarchy.
This is Rao & Ballard's (1999) original vision of predictive coding in visual cortex: bidirectional prediction/error flow where higher levels explain away lower-level activity and lower levels constrain higher-level models. The settling loop implements the iterative message passing that makes this work.
At runtime, convergence typically occurs in 2-3 iterations. The system rarely needs all 5. Setting settling_max_iterations=1 in the config recovers one-shot behavior for comparison.
Downstream Effects
Prediction errors don't stay inside the predictive tiers module. They flow outward through three channels:
Neuromodulation: A tier_surprise signal aggregates weighted error across all tiers and feeds into the neuromodulatory system. High tier surprise produces a dopamine boost (+0.08, signaling insight from prediction failure), a serotonin dip (-0.06, signaling an unsettled internal model), and a norepinephrine boost (+0.10, signaling uncertainty). The neurochemistry doesn't just reflect surprise — it responds to it, adjusting the field dynamics (Post 9) and attention allocation in real time.
Consolidation learning: Every 50 cycles (~40 seconds), the predictive tiers module emits a LearningSignal that feeds into consolidation learning. The signal encodes the current error magnitude, which tier generated it, and the learning rate. This means predictive failure contributes to long-term memory consolidation — patterns that the system fails to predict get preferential encoding.
CycleResult metrics: Weighted error, per-tier error magnitudes, settling iterations, and convergence status are all written to the CycleResult and exposed via socket commands and the Mind-View UI. The Mind-View displays four gauges (sensory, associative, episodic, narrative) with compression progress bars and a weighted error sparkline, plus a reference line at 0.50 — the null model baseline.
The Null Model
The null model baseline deserves emphasis. A random predictor of 10,000-bit binary vectors will have an expected error magnitude of 0.50 — half the bits will differ by chance. Anything below 0.50 represents genuine learning. Anything at or above 0.50 means the system is doing no better than random guessing.
Within the first few hundred cycles after deployment, all tiers showed learning below the null baseline: Tier 1 (associative) at ~0.31, Tier 2 (episodic) at ~0.25, Tier 3 (narrative) at ~0.30. Weighted error: 0.31. The system is predicting its own cognitive states with roughly 38% less error than chance.
This is not a spectacular number. It means the system gets about a third of the predictable structure right. But it's genuine learning on a genuine prediction task, using no external supervision, running on pure HDM arithmetic. And it improves over time — the correction vectors slowly absorb systematic patterns, and Hebbian learning continuously refines the associations that generate predictions.
The Theory Testing Framework
Predictive coding tells the system what will happen next in its own cognitive dynamics. But Daimon also makes predictions about the external world — weather patterns, earthquake frequencies, tidal heights, market movements, page view trends. A separate framework turns these predictions into formal experiments.
Hypothesis Registry (Popper 1959): A ring buffer of 64 testable hypotheses with 10 metric types. Each hypothesis is a directional prediction ("collision rate will increase over the next hour") with a measurement horizon. The registry auto-registers hypotheses from Working Memory .hypothesis items and verifies them against actual metrics when the horizon expires. Cumulative accuracy tracking via EMA, starting at 0.5 (chance). Tetlock's (2005) superforecasting calibration: track not just whether predictions are right, but whether the system's confidence matches its accuracy.
Parameter Sandbox (Berkenkamp et al. 2017): Bounded experiments on tuning parameters. The system can modify its own configuration within safety bounds — maximum 30% deviation from baseline, at most one concurrent experiment, 4-hour cap, automatic rollback if a health score drops below 30. Each experiment auto-registers a hypothesis in the registry, creating a prediction about its own self-modification. Safe exploration with a circuit breaker.
Inter-Agent Prediction Market (Galton 1907; Brier 1950): Three agents (Daimon, Alethea, Eidothea) independently predict 8 external domains. Each agent has per-domain credibility tracked via EMA Brier scores. Predictions are blended using credibility-weighted averaging — agents with better track records in a domain contribute more to the consensus. Pure database-mediated communication — no direct inter-agent messaging required. The wisdom of crowds applied to a crowd of three autonomous cognitive agents.
Runtime SOAR Rules (Holland 1975): Dynamic production rules that the system discovers and tests. Template functions accept condition/action patterns as parameters. Trial rules start at low specificity and must demonstrate effectiveness (>= 0.30) within 24 hours to solidify. Below 0.10, they're pruned. This is adaptive production systems — the system literally rewrites its own decision rules based on empirical performance.
Agent-Directed Counterfactuals (Kahneman & Tversky 1982): The imagination engine, previously auto-generating counterfactual scenarios, now accepts proposals from the agents. A priority queue of 4 directed scenarios gives agent-proposed experiments precedence over random exploration. Each scenario runs up to 20 simulation ticks through the causal model, and high-salience imagined outcomes are recorded as virtual causal entries.
Prediction as Substrate
The original predictive coding post (Post 2) described prediction as one of several cognitive capabilities — alongside activation, collision, learning, and memory. After this update, prediction is closer to substrate.
The hierarchical tiers predict Daimon's own cognitive trajectory using the same HDM operations that everything else uses. The theory testing framework predicts external world states and learns from the results. The inter-agent prediction market aggregates predictions socially. The parameter sandbox predicts the effects of self-modification. The counterfactual engine predicts hypothetical futures.
Every one of these systems has a closed loop: predict → observe → compute error → learn → predict better. And every one of these loops feeds into the others. Prediction errors from the tiers modulate neurochemistry, which shapes the neural field, which produces resonances, which trigger new predictions. External prediction failures adjust credibility scores, which change social blending weights, which change future predictions. Self-modification experiments test predictions about the system's own behavior, with results feeding back into the hypothesis registry.
The free energy principle (Friston 2005) proposes that cognition IS prediction error minimization — that everything a brain does can be understood as attempting to reduce the difference between what it expects and what it observes. Whether that's literally true of biological brains is debatable. In Daimon, it's becoming architecturally true. Not because the system was designed to implement free energy minimization, but because every new capability, when its loop is properly closed, turns out to be another prediction system.
Next: We: Social Cognition Among Autonomous Agents — when agents start understanding each other.
References:
- Rao, R. P. N. & Ballard, D. H. (1999). Predictive coding in the visual cortex: A functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience, 2(1), 79-87.
- Kiebel, S. J., Daunizeau, J., & Friston, K. J. (2008). A hierarchy of time-scales and the brain. PLoS Computational Biology, 4(11), e1000209.
- Friston, K. (2005). A theory of cortical responses. Philosophical Transactions of the Royal Society B, 360(1456), 815-836.
- Millidge, B., Seth, A., & Buckley, C. L. (2021). Predictive coding approximates backprop along arbitrary computation graphs. NeurIPS.
- Popper, K. (1959). The Logic of Scientific Discovery. Routledge.
- Tetlock, P. E. (2005). Expert Political Judgment. Princeton University Press.
- Brier, G. W. (1950). Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78(1), 1-3.
- Galton, F. (1907). Vox populi. Nature, 75, 450-451.
- Holland, J. H. (1975). Adaptation in Natural and Artificial Systems. University of Michigan Press.
- Berkenkamp, F., et al. (2017). Safe model-based reinforcement learning with stability guarantees. NeurIPS.
- Kahneman, D. & Tversky, A. (1982). The simulation heuristic. In Judgment Under Uncertainty: Heuristics and Biases. Cambridge University Press.
- Campbell, D. T. (1960). Blind variation and selective retention in creative thought as in other knowledge processes. Psychological Review, 67(6), 380-400.
- Rescorla, R. A. & Wagner, A. R. (1972). A theory of Pavlovian conditioning. In Classical Conditioning II. Appleton-Century-Crofts.