From Graph to Hyperspace: How Daimon Replaced Its Knowledge Graph with 10,000-Bit Vectors

Part 1 of the Daimon Update Series — February 2026


When I published the first Daimon post, the system's associative memory was a traditional knowledge graph: 19,000 nodes, 162,000 edges, explicit connections between concepts. It worked. It also had a fundamental problem.

Graphs are pointer machines. Following an edge means chasing a pointer. Similarity means counting shared neighbors. Learning means inserting or deleting edges — discrete, surgical operations that don't generalize. The brain doesn't work this way. Neurons don't maintain an adjacency list.

Two days after that post, Daimon's graph was gone. Replaced entirely by Hyperdimensional Distributed Memory (HDM) — a system where every concept is a 10,000-bit binary vector, relationships are XOR bindings, and similarity is Hamming distance computed via hardware POPCNT.

What HDM Actually Is

The core idea comes from Pentti Kanerva's work on hyperdimensional computing (1988, 2009): in very high-dimensional spaces, random vectors are almost always nearly orthogonal. This means you can represent thousands of concepts as random bit patterns and they won't interfere with each other. Similarity between concepts isn't stored — it's learned through Hebbian bit reinforcement that gradually aligns the vectors of concepts that co-activate.

Each concept in Daimon is 157 u64 words (10,048 bits). Two concepts' similarity is their Hamming distance — the number of differing bits — computed with a single XOR and population count per word. On modern hardware with AVX2 and POPCNT instructions, this is absurdly fast.

Relationships aren't edges anymore. They're XOR bindings with role vectors. "A causes B" becomes A XOR role_causal XOR B — a single vector that encodes the entire triple. Fifteen deterministic role vectors cover every edge type Daimon uses.

The Migration: Four Phases in 36 Hours

We couldn't just swap out the graph. Fourteen cognitive modules depended on it, and Daimon was running live, processing real-world data streams.

Phase 0 laid the foundation: 8 new modules in core/src/hdm/, implementing ConceptVector, ConceptSpace (with block pre-filter that eliminates ~90% of candidates before full distance computation), role bindings, a name dictionary, Hebbian LTP/LTD with density monitoring, and cross-layer spreading via implicit name bridges.

Phase 1 ran HDM in parallel after each cogloop tick without affecting behavior. DualRunMetrics tracked collision overlap (Jaccard), top-K activation agreement, and spreading latency between the two systems. This was the proof phase — making sure HDM produced comparable results before trusting it.

Phase 2 flipped the switch. All real-time cognitive operations — spreading activation, collision detection, Hebbian learning — moved to HDM. The graph stayed around for infrequent consumers (schema classification, explanation generation). The key design decision: node_id in the activation map was reinterpreted as the HDM name dictionary index, so fourteen downstream consumers needed zero changes.

Phase 3 removed the graph entirely. hdm_compat.zig (the GraphShim bridge) and hdm_migrate.zig (the one-time converter) were deleted. Fifteen consumer modules were migrated. New shared types (hdm/types.zig) and a PostgreSQL-to-HDM sync module (hdm_sync.zig) replaced what the graph used to provide. Net result: -900 lines. The graph source files remain in the tree as reference, but nothing loads them at runtime.

Verification: 75+ healthy cycles post-deploy, 4,295 concepts loaded (1,224 base layer + 3,071 user layer), prediction error at 0.037.

Making It Fast: LSH and SIMD

HDM's weakness is brute-force nearest-neighbor search. The cogloop runs ~200-400 kNearest calls per 800ms tick. With ~3,000 vectors per space, that's a lot of Hamming distances.

Locality-Sensitive Hashing brought the candidate set down from ~3,000 to ~100-150 per query. The implementation uses bit sampling for Hamming space: 10 random bit positions per hash, 20 tables, multi-probe with 1-bit flips (11 probes per table). Storage is CSR (compressed sparse row) at ~440KB total. Expected recall >96% at similarity >= 0.30. If LSH returns fewer than 16 candidates, it falls back to brute-force — correctness is never sacrificed.

SIMD vectorization targeted the remaining hot path. Four scalar inline for loops in ConceptVector were replaced with @Vector(4, u64) operations: hammingDistance (39 SIMD groups of XOR + POPCNT + reduce), bind, popcount, and a new batchHamming4 that computes four distances per pass through the query vector (keeping it hot in L1 cache). Build flags moved from Debug to ReleaseSafe with -Dcpu=native, unlocking AVX2 and hardware POPCNT on the Zen 3 host. Single biggest win: 3-10x over the Debug baseline.

What This Changes About Cognition

The shift from graph to HDM isn't just a performance optimization. It changes what's possible.

In a graph, learning means editing structure — adding edges, adjusting weights. In HDM, learning means nudging bit patterns. When two concepts co-activate, their vectors drift toward each other through Hebbian reinforcement. When activation fades, LTD drifts them apart. There's a density invariant (40-60% popcount) that prevents collapse. This is closer to what synapses actually do.

Similarity is emergent, not declared. You don't tell the system that "neural network" is related to "deep learning" — the vectors learn it from co-activation patterns. And because the space is 10,000-dimensional, the system can encode far more relationships than the ~7 edges per node the graph averaged.

The graph had 162,000 edges. HDM doesn't have edges at all. Every concept's relationship to every other concept is implicit in the angle between their vectors. That's a fundamentally different representation — and it's the one the brain actually uses.


Next: Prediction All the Way Down — replacing the flat predictor with a 4-layer hierarchy that learns online.

References:

  • Kanerva, P. (1988). Sparse Distributed Memory. MIT Press.
  • Kanerva, P. (2009). Hyperdimensional computing: An introduction to computing in distributed representation with high-dimensional random vectors. Cognitive Computation, 1(2), 139-159.
  • Plate, T. A. (2003). Holographic Reduced Representations. CSLI Publications.
  • Rahimi, A., et al. (2016). A robust and energy-efficient classifier using brain-inspired hyperdimensional computing. ISLPED.
  • Indyk, P. & Motwani, R. (1998). Approximate nearest neighbors: towards removing the curse of dimensionality. STOC.
  • Hebb, D. O. (1949). The Organization of Behavior. Wiley.
  • McClelland, J. L., McNaughton, B. L., & O'Reilly, R. C. (1995). Why there are complementary learning systems. Psychological Review, 102(3), 419-457.

Read more