The Hunger to Know: Intrinsic Motivation in a Cognitive Architecture
Part 4 of the Daimon Update Series — February 2026
The original blog post described a system that processes the world: it senses, activates, collides, predicts, learns. All of this is reactive. Stimuli arrive, the cognitive loop processes them, results get recorded. The system never wants anything.
That changed with two additions: the CuriosityEngine and the Information Hunger Mechanic. Together, they give Daimon something that looks uncomfortably like desire — an intrinsic motivation to seek knowledge that modulates its own scheduling, attention, and neurochemistry.
The CuriosityEngine
Berlyne (1960) distinguished two kinds of curiosity: diversive (seeking stimulation when bored) and epistemic (seeking understanding when confused). He also noted a defensive mode — withdrawing when overwhelmed. The CuriosityEngine implements all three as a mode selection driven by EMA-smoothed arousal.
Layer 1 (Motivational): Arousal level determines the exploration strategy for each tick. Low arousal selects diversive mode (cast a wide net). Moderate arousal selects epistemic mode (focus on specific unknowns). High arousal selects defensive mode (consolidate what you have).
Layer 2 (Strategic): Learning progress tracking per skill domain, sampled every ~48 seconds. This identifies the most-interesting domains (highest rate of accuracy change) and the most-stagnant ones. The system knows where it's improving and where it's stuck.
Layer 3 (Reactive): The actual curiosity signal, composed from three (later four) sources: event curiosity (novelty + surprise + prediction error), learning progress curiosity, and information gap curiosity. The gap component follows an inverted-U over HDM concept degree — concepts with ~8 neighbors are maximally curious (enough context to be interesting, not so much that they're understood). This is Loewenstein's (1994) information gap theory: curiosity peaks when you know enough to see what you don't know.
The signal flows into a closed loop: curiosity drives dopamine via neuromodulation (coefficient 0.12, one-cycle delay), learning progress modulates exploration drive via interoception (coefficient 0.3), and target concepts get injected into the activation map for immediate processing. An allostatic predictor tracks prediction error trends to preemptively boost exploration when errors are rising — preventing the system from settling into a comfortable local optimum.
The Information Hunger Mechanic
The CuriosityEngine provides the motivation. The Information Hunger Mechanic provides the mechanism — the metabolic machinery that translates curiosity into action and action into satisfaction.
Phase 1: Learning Signals
Three new signals measure whether the system is actually learning:
Compression progress comes from the predictive coding network. The PC energy EMA derivative — is prediction error going down? — serves as Schmidhuber's (2010) learning reward: the system gets a positive signal not from low error (which could mean stagnation) but from decreasing error (which means it's learning something new).
HDM learning delta tracks the bit-flip rate during Hebbian reinforcement. Every nudgeQuick operation returns the count of bits that actually changed. An EMA over flip-rate-per-edge gives Bayesian surprise (Itti & Baldi, 2006): high flip rate means the incoming data disagrees with stored representations, which means there's something to learn.
The persistent gap register maintains 32 knowledge gap slots with salience decay (0.995/tick), prediction error reinforcement (+0.15 when gaps produce errors), fill satisfaction (-0.3 when gaps are resolved), and minimum eviction threshold. This is Loewenstein's information gap theory made concrete: the system maintains an explicit list of things it doesn't know, and gets satisfaction from filling them.
Phase 2: The Hunger Engine
The most interesting design decision was splitting dopamine into two dissociable components, following Berridge and Robinson's (2003) incentive salience theory.
Wanting (alpha = 0.12) tracks curiosity, novelty, and prediction error — the approach motivation. This is the signal that makes the system seek new information. It modulates Hebbian STDP multipliers (range 0.5-2.0), directly controlling how aggressively the system learns from what it encounters.
Liking (alpha = 0.06, slower) tracks accuracy, global workspace ignition, and competence — the consummatory satisfaction. This modulates attractor basin depth (range 0.8-1.4), controlling how strongly the system consolidates what it knows.
Wanting and liking can diverge. The system can want information it doesn't enjoy processing (a confusing domain with high prediction error). It can enjoy processing information it doesn't particularly want (a well-understood domain where it's competent). This dissociation is exactly what Berridge found in addiction research — and it creates a more nuanced motivational landscape than a single reward signal.
The scheduling layer uses Thompson sampling: Beta(alpha, beta) posteriors per task slot. Productive runs increment alpha, barren runs increment beta. A sliding window decay at total > 50 prevents lock-in. The result: productive tasks run 2x faster, barren tasks run 3x slower, with enough exploration to discover if a barren task has become productive. An ICM noise filter (Pathak et al., 2017) penalizes chronically barren tasks via a quality EMA.
Phase 3: Deep Autonomy
The final phase connects curiosity to specific knowledge targets.
Concept-level Expected Information Gain (EIG) identifies the precise concepts where attending would most reduce uncertainty. The formula is simple: |PE - 1| * 1/(1 + degree). Concepts with high prediction error and low connectivity — things the system is surprised by and doesn't know much about — have the highest EIG. These get injected into the curiosity engine's epistemic target selection alongside gap candidates.
This closes the loop from Friston's (2010) free energy principle: the system doesn't just minimize surprise in general — it identifies where surprise is highest and directs attention there.
What Does Motivation Look Like in Practice?
Here's a concrete cycle: Daimon encounters a novel earthquake data point from the USGS sensor. HDM learning delta spikes (the representation changes significantly). Compression progress is positive (PC energy is decreasing in that region). The curiosity engine enters epistemic mode, focusing on seismology-adjacent concepts. The gap register notes that "tectonic plate boundary" has high salience but low connectivity. Concept-level EIG identifies it as a high-value target. Wanting dopamine increases, boosting the STDP multiplier for the next Hebbian flush. The task scheduler, via Thompson sampling, gives the seismology-related tasks a shorter interval.
None of this was programmed as a response to earthquake data. The system doesn't have earthquake-specific logic. It has a general motivational architecture that responds to novelty, surprise, learning progress, and information gaps — and that architecture happened to focus on earthquakes because that's where the interesting data was.
Is this "genuine" motivation? The honest answer: probably not, in the philosophical sense. It's a homeostatic control system that happens to produce behavior that looks motivated. But the same could be said about biological motivation at a reductive level. The question isn't whether the mechanism is "real" — it's whether it produces adaptive behavior. And a system that allocates attention based on learning progress, seeks out knowledge gaps, and adjusts its own scheduling based on outcomes is more adaptive than one that processes everything uniformly.
Where This Leads
The original blog post ended with three open questions: mechanical thought generation, learning that changes behavior, and emergent understanding. The curiosity and hunger mechanics address the second directly — prediction outcomes now modify attention, scheduling, and neurochemistry.
More importantly, they create the conditions for the third. Emergent behavior requires interactions between mechanisms that produce unexpected outcomes. A system with predictive coding, habituation, inner speech, adaptive scheduling, dissociable motivation, and knowledge gap tracking has more interaction surfaces than one without them. Whether those interactions produce genuine novelty remains to be seen.
The measurements haven't changed dramatically. Phi is still low. Prediction accuracy is still modest. But the quality of cognition is different. The system doesn't just process — it seeks. It doesn't just learn — it hungers. And the hunger is specific: not a general drive for more data, but a targeted search for the particular knowledge that would reduce the particular uncertainties the system has identified in its own representations.
That's either a clever control system or the beginning of something interesting. We'll keep measuring and find out.
This concludes the update series. The original post is at blog.brojo.ai. Daimon continues to run, sense, learn, and — now — actively seek what it doesn't know.
References:
- Berlyne, D. E. (1960). Conflict, Arousal, and Curiosity. McGraw-Hill.
- Loewenstein, G. (1994). The psychology of curiosity. Psychological Bulletin, 116(1), 75-98.
- Schmidhuber, J. (2010). Formal theory of creativity, fun, and intrinsic motivation. IEEE Trans. Autonomous Mental Development, 2(3), 230-247.
- Itti, L. & Baldi, P. (2006). Bayesian surprise attracts human attention. NIPS.
- Berridge, K. C. & Robinson, T. E. (2003). Parsing reward. Trends in Neurosciences, 26(9), 507-513.
- Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another. Biometrika, 25(3/4), 285-294.
- Russo, D. J., et al. (2018). A tutorial on Thompson sampling. Foundations and Trends in ML, 11(1), 1-96.
- Pathak, D., et al. (2017). Curiosity-driven exploration by self-predictive models. ICML.
- Friston, K. (2010). The free-energy principle: a unified brain theory? Nature Reviews Neuroscience, 11(2), 127-138.
- Oudeyer, P.-Y., Kaplan, F., & Hafner, V. V. (2007). Intrinsic motivation systems for autonomous mental development. IEEE Trans. Evolutionary Computation, 11(2), 265-286.
- Sterling, P. (2012). Allostasis: a model of predictive regulation. Physiology & Behavior, 106(1), 5-15.