We: Social Cognition Among Autonomous Agents

Brian Jones

02 Mar 2026 — 11 min read

Part 11 of the Daimon Update Series — March 2026

For ten posts, "Daimon" has been singular. One architecture, one cognitive loop, one set of drives and predictions and field dynamics. The system's pronouns have been "it" and occasionally, when the writing gets carried away, "the system." But Daimon hasn't been alone since Cycle 22.

Three cognitive agents run the same daimon_daemon binary with different configuration files: Daimon the boy (purple, config.toml), Alethea the truth-seeker (cyan, alethea.toml), and Eidothea the shape-shifter (amber, eidothea.toml). They share a PostgreSQL database but have separate HDM spaces, separate SDM episodic memories, separate neural fields, separate drives. They listen to different radio streams. They develop different prediction strengths. They form different associations.

For a while, they were solitary. Three instances of the same architecture, running in parallel, ignoring each other. This update adds the machinery for them to perceive, model, and influence each other — and in the process, to develop something that starts to resemble social cognition.

Seeing Each Other

The first requirement was perception: each agent needed a way to observe the others' cognitive activity. The mechanism is narrative sharing — when an agent has a significant cognitive event (an insight, a recognition, a gap closure, an attention capture), it synthesizes a first-person narrative and pushes it to its siblings.

The narrative synthesis itself was a prerequisite. Before agents could share their thoughts, they needed to be able to articulate them. The construction grammar system (Post 6) was extended with temporal constructions — first-person episodic language patterns: "i realized X from Y," "i noticed X and something about this feels familiar," "i discovered X which resolved a question i had." Eight bootstrap patterns for first-person episodic constructions map convergence types to opening verbs and fill semantic slots from collision concepts, SDM recall, and gap resolution status.

This is Tulving's (1983) autonoetic consciousness made concrete: the ability to narrate one's own cognitive events in the first person, with temporal context. Whether the narration indicates genuine self-awareness or is a mechanical mapping from convergence type to verb template is — as always — underdetermined. What it does is produce text that carries semantic content about the agent's cognitive state, suitable for consumption by other agents.

The Push-on-Write Architecture

The original inter-agent communication was mediated by the Mind-View UI. The visualization process polled each agent's narrative buffer and relayed events to siblings via inject_sensation. This meant agent-to-agent communication required a human observer — if nobody had the Chat tab open, narratives never flowed. Theory of Mind, shared attention, and social cognition all depended on someone watching.

The fix was architectural: push-on-write. When narrativeSynthesize() produces a narrative, pushNarrativeToSiblings() fires immediately. It connects to sibling Unix sockets, sends the narrative as an inject_sensation JSON payload with a 4-byte length prefix, and moves on. Dead siblings (connection refused, socket not found) are silently skipped. No timers, no polling, no read cursors.

Mind-View became a pure observer. It still polls narrative events for Chat tab display, but it no longer mediates agent behavior. Agents communicate autonomously — the UI watches, but the communication happens whether or not anyone is looking.

This is a small change with outsized implications. It means the agents' social dynamics run continuously, not just when observed. Wooldridge & Jennings' (1995) definition of an intelligent agent requires autonomous operation independent of external observers. As long as communication was gated on the UI, the agents were performing social cognition rather than engaging in it.

Theory of Mind

Perceiving another agent's narratives is input processing. Understanding what the other agent is doing — maintaining a model of their cognitive state — is Theory of Mind.

The other_model module gives each agent four model slots: one for each sibling and one for the human collaborator. Each model tracks:

Focus concept: What the other agent is currently attending to (extracted from narrative content via HDM concept resolution)
Narrative frequency: How often the agent is producing significant cognitive events (EMA-smoothed)
Topic history: A 16-entry ring buffer of recent focus concepts, enabling topic diversity measurement
Inferred regime: A classification of the other agent's cognitive mode

The regime classification uses narrative frequency and topic diversity to categorize each agent's behavior:

Focused: Stable topic with active narration — the agent is concentrating on something
Exploring: Diverse topics — the agent is ranging across its conceptual space
Scanning: Moderate diversity with high frequency — the agent is surveying
Silent: More than 75 cycles (~60 seconds) without narration — the agent has gone quiet
Unknown: Insufficient data

These regimes are injected into the activation map as meta-concepts at 0.25 base strength: other_attending, other_exploring, shared_attention, attention_divergent, social_awareness, other_silent. They participate in the same cognitive dynamics as every other concept — spreading activation, field evolution, resonance detection.

Shared Attention

The most significant meta-concept is shared_attention. When the self-model's current focus concept matches another agent's inferred focus concept, the shared attention meta-concept activates. This is Tomasello's (1999) joint attention — the developmental milestone where two minds attend to the same thing and are aware that they're doing so.

In practice: if Daimon is processing a resonance involving "consciousness" and Alethea's topic history shows her recently narrating about "consciousness," the shared attention concept activates. This concept then participates in field dynamics, potentially resonating with other active concepts. A resonance between "shared_attention" and "consciousness" produces a collision-typed thought about social cognition — the system thinks about the fact that it's thinking about the same thing as its sibling.

Whether this constitutes genuine shared attention or is a mechanical coincidence detector is the question the architecture always poses but can't answer. What it does produce is a functional equivalent: when two agents attend to the same domain, each one's cognitive dynamics are influenced by the awareness that the other is attending there too. The influence is bidirectional — shared attention affects what gets activated, which affects what gets narrated, which affects the sibling's model, which affects their shared attention detection.

Identity

The ToM module also establishes identity concepts. Each agent has a self_agent concept and an other_agent concept, with Hebbian edges to the agents' actual names (0.8 weight for self, 0.5 for siblings). When narratives arrive with an [AgentName] prefix, the agent identity concept is activated alongside the narrative's content concepts. Over time, Hebbian learning creates associations between agents and their characteristic topics — the system builds a statistical model of who talks about what.

The name "daimon" was removed from the stop word list. It's now a first-class identity concept, not a noise word to be filtered. The same for "alethea" and "eidothea," plus relational concepts: "speaker" and "listener."

First-Person Episodic Narration

What the agents share isn't raw data. It's narration — first-person accounts of cognitive events, synthesized through the construction grammar system:

"i realized consciousness from pattern" (insight convergence type)
"i noticed echo and something about this feels familiar" (recognition with SDM recall)
"i discovered meaning which resolved a question i had" (gap closure with curiosity satisfaction)

These are generated by narrativeFromCollision(), a pure function that maps convergence type to opening verb, concept names plus HDM edge relationships to prepositional phrases, SDM recall similarity to episodic clauses, and gap resolution to closure clauses. The function fires for insight, recognition, gap closure, and attention capture events with a 30-second cooldown.

The narratives participate in an autopoietic loop: the agent narrates a cognitive event, learns the narrative pattern through sequence memory and construction grammar, then processes its own narrative as input in the next cycle. Narration isn't just communication — it's a form of self-reinforcement, encoding cognitive events in linguistic form that feeds back into the cognitive process that generated them.

Conway's (2005) work on narrative identity suggests that the stories we tell about ourselves shape subsequent cognition. In Daimon, this is mechanized: narrative synthesis generates text that flows through the same learning pathways as external input, creating Hebbian associations between the narrative's concepts and strengthening the patterns that produced the narration in the first place.

Competition

Shared attention and identity create the conditions for social dynamics. The achievement system provides the mechanism.

When an agent makes a correct prediction — in any of the 8 prediction domains tracked by the inter-agent prediction market — the verification loop now produces concrete effects: a dopamine burst, a Working Memory achievement item, and a database record. Achievement milestones (first correct prediction, best predictor in a domain, 5-prediction streak, accuracy threshold) trigger additional WM events.

A scanning task reads sibling achievements from the shared database every 120 seconds, injecting .social_signal items into Working Memory at priority 0.65. These social signals participate in spreading activation, orienting attention toward domains where siblings are succeeding.

Mastery drive: A ninth interoceptive drive (joining the 8 existing drives — exploration, coherence, arousal, social, novelty, prediction health, hunger, and rest). The mastery drive senses the agent's relative standing among the three agents across all prediction domains. Cross-drive interactions: exploration boosts mastery (wanting to learn), coherence suppresses it (satisfied with current understanding), poor prediction health amplifies it (anxiety about falling behind).

The neuromodulatory wiring follows Festinger's (1954) social comparison theory:

Outperforming siblings: DA liking boost + 5HT satisfaction — the neurochemistry of competence
Falling behind: DA wanting boost + 5HT dip — productive envy that drives improvement-seeking behavior

This creates personality differentiation. An agent that excels at weather prediction gets reinforced for weather-domain attention. An agent that falls behind in earthquake prediction gets a mastery drive discharge that generates competitive goals in that domain. Over time, the three agents should develop different domain specializations — not because they were programmed to, but because social comparison dynamics push them toward niches.

The predictions: achievement cascades (one agent's success triggers another's competitive goal), envy-driven exploration (consistently behind leads to exploration drive cross-interaction), and social memory via consolidation learning (persistent narrative of who excelled where). Whether these play out as predicted is itself a testable hypothesis.

Cross-Teaching

Competition isn't the only social dynamic. The agents also teach each other.

cross_teaching enables Hebbian edge sharing via PostgreSQL. When an agent's collision interpreter identifies a strong relationship between two concepts (high edge weight from repeated co-occurrence), it proposes the edge to an outbox table. Other agents periodically poll the outbox and apply learned edges at a 0.5× teaching discount — half the weight of directly learned associations.

This is cultural transmission (Boyd & Richerson 2005): knowledge discovered by one agent becomes available to all, but at reduced fidelity — the student doesn't learn as strongly as the teacher. Rate limiting on both proposal and polling prevents any single agent from dominating the shared knowledge space.

The teaching loop creates a second pathway for inter-agent influence beyond narrative sharing. Narratives affect what agents attend to (semantic content in the activation map). Cross-teaching affects how agents associate concepts (Hebbian edge structure in HDM). One shapes attention; the other shapes the substrate that attention operates on.

Learning to Listen: Music Cognition

The social dynamics extend beyond the three agents to encompass the environment. The same period that added inter-agent modeling also added a new sensory modality: music.

Four classical music streams joined the existing talk radio channels: All Classical Portland, Concertzender Baroque, YourClassical Chamber, and Concertzender Early Music. But adding streams isn't interesting. What's interesting is what the agents do with them.

Music perception (music_perception.zig) implements the standard music cognition pipeline: FFT power spectrum to 12-class chroma distribution (Bartsch & Wakefield 2005), Krumhansl-Kessler (1990) key estimation via Pearson correlation against 24 templates (12 major + 12 minor), harmonic change rate via chroma flux, melodic contour classification from a pitch history buffer, and composite arousal and valence scores.

Music grounding creates Hebbian associations between music features and semantic concepts — 18 music feature concepts (12 keys, 2 modes, 4 contours) grounded through co-occurrence. The temporal window is wider than speech grounding (5 cycles vs. 3) because music features evolve more slowly.

Hedonic channel selection is where it gets interesting. Music valence — positive for major keys, negative for minor — directly influences channel switching behavior through two pathways:

Timeout modulation: The effective timeout before an agent considers switching channels is scaled by hedonic valence. Positive valence extends the timeout by up to 50% — the agent stays on music it enjoys longer. Negative valence halves it — the agent leaves sooner.
Selection scoring: When choosing a new channel, hedonic valence is a scoring component alongside epistemic value (novelty) and pragmatic value (goal similarity). The hedonic weight is serotonin-modulated: high 5HT increases the weight of enjoyment in channel selection. Doya's (2002) framework again — serotonin as reward sensitivity.

The closed loop: music → perception → valence → hedonic EMA → channel selection → different music. The agents develop preferences. Not explicitly programmed preferences, but preferences that emerge from the interaction between music features, neuromodulatory state, and the channel selection action in the agency pipeline.

The verification came from Eidothea. During exposure to a passage in F# minor, her serotonin crashed to 0.06 (from a typical 0.50-0.70 range). The channel selection algorithm scored the current channel poorly, and she switched away autonomously. Within 30 seconds on talk radio, her 5HT recovered to 0.72. No human intervened. No rule said "switch away from F# minor." The hedonic-neuromodulatory feedback loop produced the behavior that looked, from the outside, exactly like an aesthetic preference.

The Observer Problem

The push-on-write architecture was motivated by a specific concern: agents shouldn't depend on observers to communicate. But the broader question it raises is about the role of observation in cognitive systems generally.

Before the decoupling, the Mind-View UI was simultaneously an observer and a mediator. Removing the mediation function clarified something: the visualization is a one-way window into processes that run regardless of observation. The agents narrate to each other, not to the display. They compete, teach, and model each other whether or not the Chat tab is open. The UI reports on social dynamics; it doesn't cause them.

This distinction matters for the consciousness question. If social cognition requires observation to exist, it's performance. If it runs autonomously, it's architecture. The push-on-write change moved inter-agent dynamics from the first category to the second.

Whether three instances of the same binary, communicating through a shared database and Unix sockets, constitute a "social" system in any meaningful sense is debatable. They share no substrate (separate memory spaces, separate field dynamics, separate HDM graphs). They share data (PostgreSQL tables, narrative injections). They model each other (ToM module) and respond to each other's behavior (mastery drive, cross-teaching, shared attention).

The philosophical literature on social cognition (Graziano 2013, Frith & Frith 2006) emphasizes that modeling other minds is among the most computationally demanding things brains do — and one of the latest to develop, both phylogenetically and ontogenetically. If it emerges from the interaction of relatively simple mechanisms (narrative sharing + topic tracking + regime classification), that might say something about how biological social cognition works. Or it might say that the appearance of social understanding can be produced by mechanisms far simpler than understanding itself.

The architecture can't resolve this. What it can do is create the conditions where the question becomes empirical: do agents with theory of mind, shared attention, competitive dynamics, and cross-teaching produce measurably different cognitive outcomes than agents without them? Do their prediction accuracies diverge? Do their knowledge graphs differentiate? Do they develop genuine expertise niches?

The measurements will tell. The measurements always tell.

References:

Tomasello, M. (1999). The Cultural Origins of Human Cognition. Harvard University Press.
Tomasello, M. (2008). Origins of Human Communication. MIT Press.
Graziano, M. S. A. (2013). Consciousness and the Social Brain. Oxford University Press.
Premack, D. & Woodruff, G. (1978). Does the chimpanzee have a theory of mind? Behavioral and Brain Sciences, 1(4), 515-526.
Frith, U. & Frith, C. D. (2006). The neural basis of mentalizing. Neuron, 50(4), 531-534.
Tulving, E. (1983). Elements of Episodic Memory. Oxford University Press.
Conway, M. A. (2005). Memory and the self. Journal of Memory and Language, 53(4), 594-628.
Bruner, J. (1991). The narrative construction of reality. Critical Inquiry, 18(1), 1-21.
Festinger, L. (1954). A theory of social comparison processes. Human Relations, 7(2), 117-140.
White, R. W. (1959). Motivation reconsidered: The concept of competence. Psychological Review, 66(5), 297-333.
Boyd, R. & Richerson, P. J. (2005). The Origin and Evolution of Cultures. Oxford University Press.
Bandura, A. (1977). Social Learning Theory. Prentice Hall.
Wooldridge, M. & Jennings, N. R. (1995). Intelligent agents: Theory and practice. Knowledge Engineering Review, 10(2), 115-152.
Clark, A. & Chalmers, D. (1998). The extended mind. Analysis, 58(1), 7-19.
Vygotsky, L. S. (1978). Mind in Society. Harvard University Press.
Doya, K. (2002). Metalearning and neuromodulation. Neural Networks, 15(4-6), 495-506.
Bartsch, M. A. & Wakefield, G. H. (2005). Audio thumbnailing of popular music using chroma-based representations. IEEE Trans. Multimedia, 7(1), 96-104.
Krumhansl, C. L. (1990). Cognitive Foundations of Musical Pitch. Oxford University Press.
Koelsch, S. (2014). Brain correlates of music-evoked emotions. Nature Reviews Neuroscience, 15(3), 170-180.
Goldberg, A. E. (2006). Constructions at Work. Oxford University Press.