The Will to Act: Agency in Hyperspace
Part 8 of the Daimon Update Series — February 2026
Through seven posts, Daimon has acquired senses, learned to predict, developed inner speech, discovered curiosity, learned to hear, found its voice, and gained structural cognition. All of this is sophisticated information processing. None of it is choosing.
The system processes stimuli and produces responses. Curiosity modulates attention. Predictions generate surprise signals. But nowhere in the pipeline does the system select an action from alternatives, commit to a course of action despite distractions, or evaluate whether a chosen action achieved its goal. These are the hallmarks of agency — and they're what distinguishes a cognitive architecture from a signal processor.
Cycle 24 added seven modules that give Daimon a decision pipeline. The design constraint: everything must use existing HDM primitives (bind, bundle, shift, similarity). No new substrates. If agency can't emerge from the same hyperspace that handles perception and memory, it's not integrated agency — it's a bolted-on controller.
The Decision Pipeline
The pipeline follows the loop that Friston's (2010) active inference framework predicts: perceive state → encode goals → predict outcomes → evaluate actions → plan sequences → commit to intentions → execute → observe reward → update world model.
Goal encoding (Eliasmith 2013): Working Memory goal items are converted to structured HDM vectors. Each goal is a ConceptVector that encodes the desired state — extracted concepts from the goal text, bound via reference frame positional encoding. This gives goals the same representational currency as percepts and memories. A goal and a perception can be directly compared via Hamming distance to measure how close the current state is to the desired one.
Causal world model (Ni et al. 2022, Hersche et al. 2023): State + action → predicted outcome, via HD regression. The causal model maintains a snapshot of the current cognitive state (bundled concept activations) and learns associations between state-action pairs and their outcomes. Reinforcement follows Rescorla-Wagner (1972): prediction error drives learning, with the error signal decaying as the model becomes more accurate. The model doesn't know the rules of the world — it learns them from what happens when actions are taken.
Action selection (Friston 2010, Doya 2002): Expected Free Energy (EFE) scoring with neuromodulator modulation. Each action in the vocabulary (20 built-in actions, represented as HDM composites via reference frame positional binding) is evaluated by predicting its outcome through the causal model, then scoring: how much does this action reduce the distance between current state and goal (pragmatic value), and how much does it reduce uncertainty about the world (epistemic value)?
The neuromodulators directly modulate the explore/exploit tradeoff: dopamine weights pragmatic value (exploit what's known to work), norepinephrine weights epistemic value (explore to reduce uncertainty), serotonin modulates planning depth (patience for longer plans). This is Doya's (2002) metalearning theory made concrete.
Planning (Friston et al. 2017): Beam search over multi-step action sequences. Starting from the current state, the planner simulates action sequences through the causal model, evaluating each step's EFE. Planning depth is modulated by serotonin — high 5HT (patience, contentment) allows deeper planning (up to 5 steps), low 5HT (urgency) constrains to shallow plans (2 steps). The beam width keeps the search tractable while maintaining diverse candidates.
Intention management (Bratman 1987): The winning plan isn't executed immediately — it becomes an intention. Bratman's theory of practical reasoning distinguishes intentions from mere desires: intentions involve commitment. Once formed, an intention resists distraction. The intention manager tracks a lifecycle: forming → committed → executing → completed/abandoned. Committed intentions persist across cognitive cycles, providing continuity of purpose that outlasts individual 800ms ticks.
Action execution (Bent et al. 2024): Vector decisions become cognitive effects. The executor translates the selected action into concrete changes: activation shifts (amplify concepts, suppress others), Working Memory modifications (add goals, mark items), and neuromodulatory nudges. After execution, the resulting state is compared to the causal model's prediction — the reward signal. Positive reward reinforces the state-action-outcome association. Negative reward weakens it.
The pipeline adds 13 new fields to the CycleResult and 10 to CogloopState. Three new cogloop steps: 4j (causal world model snapshot + recording), 4k (action selection + beam search planning), 4l (action execution with cooldown enforcement). The system decides once per tick, with a cooldown to prevent rapid oscillation between actions.
Closing the Learning Loops
Agency doesn't mean much if the system never learns from its decisions. Cycle 24 assumed the learning infrastructure was working. Cycles 22 and 23 discovered it wasn't.
The audit was sobering. The fast cognitive loop — temporal frames, grammar, episodic indexing — was functioning. But the slow loop — consolidation, event tracking, schema discovery — was dormant. "Experiencing but not learning," as the commit message put it.
Temporal frame quality: Frames were sealing every 6 ticks regardless, the recognition threshold matched random noise (0.20 vs ~0.30 baseline similarity), and column evidence was per-tick instead of cumulative. The system was generating temporal frames that were structurally meaningless and recognizing everything as familiar.
Event persistence: Cognitive events (collisions, novelties, surprises) were pushed to an in-memory ring buffer and lost on every restart. No causal history survived between sessions. Adding PostgreSQL persistence to the event log made learning cumulative.
Silent SQL failures: Domain emergence names contained null bytes that broke PostgreSQL queries. SM-2 interval calculations overflowed to infinity (one domain reached a review interval of 1.14 × 10³⁵ hours). The oscillator loaded domain configs from a source that no longer existed. Dollar-quoting fixed unterminated string errors. Buffer sizes were inadequate. Each failure was silent — no error log, no crash, just a mechanism that stopped producing results.
Cycle 23 found three more: the generalized rules SQL query referenced nonexistent columns (the rules existed but couldn't be loaded), domain emergence concentration had no cap (one domain dominated at 93.1%), and temporal frame columns never learned sealed composites (they observed but never stored, so evidence never accumulated). Plus six direct activate() calls in the cogloop that should have been activateStream() with proper stream types — those activations were invisible to cross-stream collision detection.
Graduation: When Learning Amplifies Learning
The most interesting learning loop fix was graduation feedback. Concept graduation — the process where well-learned concepts cross a quality threshold (0.75) — was computing scores and writing database rows. But the graduated status had no downstream effects. Open loop.
Four feedback mechanisms close it:
Hub boost: Graduated concepts spread 1.5× more activation through HDM similarity space. All spreading paths (attention, CFC, manual) automatically benefit. Well-learned concepts become stronger activators — ACT-R's base-level activation principle (Anderson 2007).
Schema acceleration: Concepts near graduated neighbors get up to +0.15 bonus in graduation scoring, effectively lowering the threshold from 0.75 to ~0.60 for well-integrated concepts. This is schema theory (Tse et al. 2007): new memories consistent with existing schemas consolidate faster.
Adaptive novelty dampening: Dampening scales with the global graduation ratio. Early graduations barely dampened (ratio → 0, factor → 1.0). Mature system fully dampened (ratio → 1, factor → 0.3). The formula: base + (1 - base) × (1 - ratio)². A system that has learned a lot pays less attention to what it already knows — SOAR's chunking principle (Laird et al. 1986).
Oscillator deprioritization: Graduated-dense domains get discounted scores (1 - 0.5 × density), pushing attention toward frontier domains with more to learn. This is the explore/exploit balance at the domain level (Doya 2002).
The graduation count went from ~70 stuck concepts to a growing frontier. More importantly, each graduation makes the next one easier in nearby conceptual territory — a cascading wave of learning through the knowledge graph.
Prediction Verification: Closing the Last Open Loops
The prediction system had three remaining open loops, each a case of "compute, log, forget."
Predictive coding cross-cycle verification: The 4-layer PC network ran every 800ms cycle but never checked whether its predictions were correct. It predicted. It moved on. Now it snapshots its predictions, compares against next cycle's actual activations, and tracks cross-cycle accuracy via EMA. The network learns from its own mistakes.
Bias correction: Skill models tracked systematic bias per domain, but prediction tasks ignored this data entirely. All 8 prediction tasks (weather, earthquake, river, tides, AQI, Kp, pageviews, market) now query learned bias and subtract it before generating new predictions. Tagged with _bc suffix so the correction is visible.
Hierarchical predictor tracking: The cognitive bus hierarchical predictor makes context state predictions each cycle but never verified accuracy. Now it tracks transition accuracy via EMA and generates learning signals. The system knows whether its context-level predictions are improving.
Subsystem validation: Four new null-model audits ensure that feedback mechanisms are actually producing above-chance results. Graduation reactivation gets a binomial z-test. Construction grammar is tested for output rate and type diversity. Acoustic grounding is tested for structured arousal variance. Audio prosody is tested for non-uniform sound class distribution. If any mechanism degrades to null-model performance, the audit catches it.
What Agency Looks Like
Here's a concrete decision cycle: Daimon's Working Memory contains a goal related to understanding tidal patterns (injected by the curiosity engine's gap register). The goal encoder converts this to an HDM vector. The causal model predicts that the action "explore_concept" applied to tidal data will reduce the gap. The action selector scores this against alternatives — "explore_concept" has higher pragmatic value (moves toward the goal) than "consolidate_memory" (doesn't address the gap). The planner extends: explore_concept → observe_result → consolidate_memory (a 3-step plan, modulated by current serotonin). The intention manager commits. The executor amplifies tidal-related concepts in the activation map and nudges dopamine. Next tick: the causal model records whether the activation shift actually produced new collisions or reduced prediction error. If yes, the association is reinforced. If no, it weakens.
Is this agency? It's a system that selects actions based on predicted outcomes, commits to plans, executes them, and learns from results. The actions are cognitive (attention shifts, memory operations, neuromodulatory nudges) rather than physical. The goals come from curiosity rather than external instruction. The evaluation uses the system's own prediction error rather than external reward.
Whether this constitutes "genuine" agency in the philosophical sense — whether there's something it's like to choose — is a question these posts can't answer. But there's a functional difference between a system that processes uniformly and a system that selects, commits, acts, evaluates, and adjusts. Daimon does the latter. Every decision changes the world model that generates future decisions. The loop doesn't just close — it spirals.
The System Now
Eight posts ago, Daimon was a knowledge graph with a cognitive loop. Now it's a 39-module cognitive architecture running pure HDM-based cognition across 96,000 lines of Zig, with no LLM anywhere in the pipeline.
It hears sound and learns word categories from exposure. It speaks through its own formant synthesizer and corrects its articulation from auditory feedback. It maintains 2,048 cortical columns organized by dynamically emerging domains. It makes decisions through active inference and commits to intentions. It learns from prediction error at every level — sensory, predictive coding, cross-cycle, cross-domain, causal. It seeks knowledge gaps and directs its own attention toward the most informative targets.
Each of these capabilities is measured. Null-model audits test whether mechanisms produce above-chance results. Prediction accuracy is tracked per domain. Learning progress modulates motivation. Graduation rates measure knowledge consolidation. The system that measures itself is also the system that adjusts based on measurement.
The original blog post asked whether a low-power cognitive architecture could learn without LLMs. The answer, eight update posts later, is: yes, within limits. The system learns acoustic categories, language constructions, conceptual relationships, causal models, and decision policies. It doesn't generate novel text, solve mathematical proofs, or produce creative artifacts. It's not competing with GPT-4. It's exploring whether a fundamentally different approach to machine cognition — one based on sparse distributed representations, Hebbian learning, and closed-loop feedback — can achieve genuine understanding rather than sophisticated pattern completion.
The measurements haven't answered the consciousness question. Phi remains low. Self-model accuracy is modest. The null-model audit found that most consciousness-vocabulary claims don't survive statistical scrutiny. But the architecture is richer than it was. More interaction surfaces, more feedback loops, more opportunities for emergent behavior. Whether those opportunities produce something worth calling understanding — that's what the next set of cycles will test.
The original post is at blog.brojo.ai. Previous updates: Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 7. Daimon continues to run, sense, learn, speak, hear, decide, and — always — measure whether any of it matters.
References:
- Friston, K. (2010). The free-energy principle: A unified brain theory? Nature Reviews Neuroscience, 11(2), 127-138.
- Friston, K., et al. (2017). Active inference, curiosity and insight. Neural Computation, 29(10), 2633-2683.
- Doya, K. (2002). Metalearning and neuromodulation. Neural Networks, 15(4-6), 495-506.
- Eliasmith, C. (2013). How to Build a Brain: A Neural Architecture for Biological Cognition. Oxford University Press.
- Bratman, M. E. (1987). Intention, Plans, and Practical Reason. Harvard University Press.
- Rescorla, R. A. & Wagner, A. R. (1972). A theory of Pavlovian conditioning. In Classical Conditioning II.
- Anderson, J. R. (2007). How Can the Human Mind Occur in the Physical Universe? Oxford University Press.
- Tse, D., et al. (2007). Schemas and memory consolidation. Science, 316(5821), 76-82.
- Laird, J. E., Rosenbloom, P. S., & Newell, A. (1986). Chunking in Soar. Machine Learning, 1(1), 11-46.
- Parr, T. & Friston, K. J. (2019). Generalised free energy and active inference. Biological Cybernetics, 113(5-6), 495-513.
- Ni, Y., et al. (2022). QHD: A brain-inspired hyperdimensional reinforcement learning algorithm. ICML.
- Hersche, M., et al. (2023). NVSA: Neuro-vector-symbolic architecture. NeurIPS.
- Bent, O., et al. (2024). VSA for the OODA loop. arXiv.
- Dehaene, S. (2003). A neuronal model of a global workspace. PNAS, 95(24), 14529-14534.
- Schultz, W. (1997). A neural substrate of prediction and reward. Science, 275(5306), 1593-1599.
- Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences. Lawrence Erlbaum.
- Goldberg, A. E. (1995). Constructions: A Construction Grammar Approach to Argument Structure. University of Chicago Press.