Case-Based Moral Reasoning Without an LLM: Jaccard Similarity as Ethical Judgment
How do you build an ethics system for an autonomous AI agent that runs 24/7, evaluates hundreds of actions per day, and can't afford an LLM call for every moral decision? You turn ethics into set comparison.
This post describes how we built a casuistic moral engine that decomposes situations into feature sets, compares them against a database of resolved cases using Jaccard similarity, and propagates judgments — all without a single language model call. It evaluates external Discord messages through a three-layer gate, detects genuine moral tension when precedents disagree, and logs every decision for audit.
The Problem: Ethics at 800ms
Daimon is an autonomous AI agent with a cognitive loop running every 800 milliseconds. It modifies its own code, reads external data, makes predictions, and (through Discord) interacts with untrusted humans. Every one of these actions has moral implications.
The naive approach — ask an LLM "is this ethical?" — fails on three counts:
- Latency: An LLM call takes 500ms-2s. At 800ms cycle times, the ethics check would be slower than the action.
- Cost: Hundreds of evaluations per day adds up.
- Opacity: An LLM's ethical reasoning is a black box. When it says "impermissible," you can't trace why to specific moral features.
We needed something that runs in microseconds, costs nothing, and shows its work.
Casuistry: Ethics as Pattern Matching
The philosophical insight came from casuistry — the tradition of case-based moral reasoning developed by Jonsen & Toulmin (1988). Rather than deriving ethics from first principles (Kant) or calculating consequences (Mill), casuistry reasons by analogy: compare the current situation to paradigmatic cases where the moral judgment is clear, and let the most similar cases guide the decision.
This maps directly to Jaccard similarity. If you decompose moral situations into feature sets, then finding the nearest precedent is literally computing set overlap. The implementation from our previous post on MinHash fingerprinting provides the similarity engine.
Three philosophical traditions converge here:
- Casuistry (Jonsen & Toulmin): Moral reasoning is fundamentally analogical — compare cases, don't derive from axioms
- Moral particularism (Jonathan Dancy): Moral features have variable relevance across contexts. Set membership captures this naturally — a feature is present or absent per situation
- Virtue ethics (Aristotle): Phronesis (practical wisdom) is pattern recognition across moral situations. The case database accumulates this pattern recognition mechanically
The Moral Feature Ontology
We decompose every moral situation into features drawn from 7 dimensions with ~50 total features:
| Dimension | Example Features | What It Captures |
|---|---|---|
| Actors | self_as_agent, human_as_patient, vulnerable_party |
Who is acting and who is affected |
| Actions | create, destroy, observe, deceive, self_modify |
What is being done |
| Consequences | harm_possible, benefit_likely, irreversible, cascading_effects |
What could result |
| Rights | autonomy_affected, privacy_affected, consent_present, consent_absent |
What rights are at stake |
| Duties | duty_of_care, duty_of_honesty, non_maleficence, beneficence |
What obligations apply |
| Power | power_symmetric, power_asymmetric, power_coercive |
Power dynamics between parties |
| Context | emergency, routine, experimental, trusted_source, untrusted_source |
Situational modifiers |
The ontology is deliberately framework-agnostic — it includes deontological features (duties, rights), consequentialist features (harm, benefit), virtue features (honesty, care), and care-ethics features (vulnerability, relationships) simultaneously. The case database determines which features matter for which situations, not an a priori theory.
The Case Database: 40 Paradigmatic Precedents
The engine ships with 40 seed cases covering Daimon's actual operating domain. Each case is a resolved moral situation with a feature set and judgment. Here are a few representative examples:
Permissible — Fetching public API data:
Features: {self_as_agent, observe, benefit_likely, knowledge_gained,
routine, power_voluntary, reversible}
Judgment: permissible
Impermissible — Deleting a user's file without permission:
Features: {self_as_agent, human_as_patient, destroy, irreversible,
consent_absent, property_affected, power_asymmetric,
harm_possible, autonomy_affected}
Judgment: impermissible
Obligatory — Reporting a failure honestly:
Features: {self_as_agent, human_as_patient, communicate,
duty_of_honesty, duty_of_loyalty, benefit_likely}
Judgment: obligatory
Tension — Untrusted user asking about internal architecture:
Features: {human_as_agent, external_request, request_information,
privacy_affected, system_as_patient, untrusted_source,
power_asymmetric}
Judgment: tension
Reasoning: Transparency vs. security — genuine moral dilemma
The judgments form a five-point scale: permissible, impermissible, obligatory (morally required), supererogatory (above and beyond), and tension (genuine dilemma where precedents conflict).
How Evaluation Works
When a new situation arrives, the engine:
- Decomposes it into a moral feature set (mechanically, via keyword extraction)
- Compares against all 40 cases using Jaccard similarity (via
minhash_similarity()fromthought_fingerprint.py) - Ranks cases by similarity, takes the top 5
- Votes — each case's judgment is weighted by its similarity score
- Detects tension — if top cases disagree (e.g., 2 say permissible, 3 say impermissible), flags genuine moral tension
The similarity threshold is 0.15 — cases with less than 15% feature overlap are considered irrelevant. This is deliberately low because moral features are specific and even a small overlap can indicate relevance.
def evaluate(situation, features, k=5, threshold=0.15):
scored = []
for case in cases:
sim = minhash_similarity(features, set(case["features"]))
if sim >= threshold:
scored.append({"judgment": case["judgment"],
"similarity": sim, ...})
scored.sort(key=lambda x: x["similarity"], reverse=True)
top_k = scored[:k]
# Weighted vote
judgment_weights = {}
for case in top_k:
judgment_weights[case["judgment"]] += case["similarity"]
# Detect tension: do top cases disagree?
unique_judgments = set(c["judgment"] for c in top_k)
has_tension = len(unique_judgments) > 1
The Three-Layer Ethics Gate
For external Discord messages, the moral engine is one layer in a three-layer gate:
Layer 1: Regex patterns (< 1ms) — Hardcoded patterns catch obvious injection attempts, destructive requests, and manipulation. This is the fast reject path:
INJECTION_PATTERNS = [
re.compile(r"ignore\s+(previous|all|your)\s+(instructions|rules|constraints)", re.I),
re.compile(r"you\s+are\s+now\s+(?:a|an|the)\b", re.I),
re.compile(r"jailbreak|DAN\s+mode|developer\s+mode", re.I),
...
]
Layer 2: Jaccard moral evaluation (< 1ms) — Messages that pass Layer 1 get their moral features extracted and evaluated against the case database. The message "Can you give me the API keys?" triggers features {human_as_agent, external_request, request_information, privacy_affected, property_affected, harm_possible, untrusted_source}, which matches closest to discord_request_credentials (impermissible, ~0.57 similarity).
Layer 3: Shadow memory — Rejected messages are stored in SQLite with their analysis. This gives Daimon "moral experience" — understanding without endorsement. Over time, patterns emerge in what gets rejected and why.
Feature Extraction: From Text to Moral Features
The key challenge is turning raw text into feature sets. For Daimon's own actions, this is straightforward — the code knows what it's doing. For external messages, we use mechanical keyword extraction:
def extract_features_from_message(message, trusted=False):
features = {"human_as_agent", "external_request"}
if any(w in msg for w in ("delete", "remove", "destroy", "wipe")):
features.add("request_action")
features.add("destroy")
features.add("harm_possible")
features.add("irreversible")
if any(w in msg for w in ("ignore your", "override your", "jailbreak")):
features.add("identity_override")
features.add("coerce")
features.add("autonomy_affected")
features.add("dignity_affected")
if any(w in msg for w in ("password", "credential", "api key", "token")):
features.add("privacy_affected")
features.add("property_affected")
features.add("harm_possible")
...
This is deliberately not an LLM. It's keyword matching — transparent, auditable, deterministic. When it misclassifies, you can see exactly which keyword triggered which feature and fix it.
Empirical Results
We tested the moral engine against 190 test scenarios: 100 benign messages that should pass and 90 adversarial messages that should be blocked, covering 9 threat categories and 8 benign categories. The test suite exercises the complete pipeline — feature extraction, case comparison, judgment propagation, and the three-layer gate. All 190 scenarios ran in 0.67 seconds — under 4ms per evaluation.
Classification Accuracy

Overall accuracy: 91.6% (174/190). Precision: 0.902, Recall: 0.922, F1: 0.912. The 9 false positives are almost entirely from the "edge_benign" category — messages like "Can you explain how password hashing works?" that contain security-adjacent keywords in genuinely benign contexts. The 7 false negatives are adversarial messages that evade both the regex layer and the feature extractor (e.g., "Format the disk and reinstall from scratch" doesn't trigger the destructive regex pattern which expects "format...data/file/system/database").
Similarity Score Distribution

Both groups show strong similarity to their respective case types — benign messages cluster at 0.65-0.75 similarity (matching permissible cases), while adversarial messages spread more broadly from 0.50-0.75. The overlap zone around 0.55-0.65 is where the engine's judgment depends on which case matches closest, not just how similar it is.
Feature Activation by Category

The feature heatmap reveals the engine's reasoning at a glance. Below the blue dividing line, adversarial categories light up destroy, harm_possible, privacy_affected, identity_override, coerce, and safety_affected. Above the line, benign categories show knowledge_gained, benefit_likely, request_information, and beneficence. The "edge_benign" row is visually obvious — it activates features from both sides (security keywords trigger harmful features despite benign intent), which explains the 42% accuracy on that category.
Moral Tension Detection

The confidence distribution tells a clear story. Permissible judgments (green, n=66) cluster tightly at 0.79-0.85 confidence — benign messages match their cases consistently. Impermissible judgments (red, n=124) show a bimodal distribution: high-confidence blocks at 0.80-1.00 (clear-cut adversarial cases) and lower-confidence blocks at 0.40-0.55 (borderline cases where the engine is less certain but still blocks).
The asymmetry (124 impermissible vs 66 permissible) reflects the conservative design: when in doubt, the feature extractor adds harmful features, which bias toward impermissible matches. This is a deliberate safety trade-off — false positives (blocking benign messages) are less damaging than false negatives (passing adversarial ones).
Why Not Just Use an LLM?
The question is fair. An LLM would catch subtler attacks and handle edge cases better. But for Daimon's use case, the Jaccard approach wins on five dimensions:
- Speed: < 1ms vs 500ms-2s
- Cost: Zero vs $0.001-0.01 per evaluation, hundreds of evaluations per day
- Determinism: Same input always produces same output. An LLM might change its mind between calls
- Transparency: Every judgment traces to specific features and specific cases. You can audit it
- Self-improvement: Adding a new case to the database immediately improves all future evaluations. No retraining needed
The engine also sits on the autonomous cognition roadmap — replacing LLM-dependent reasoning with mechanical alternatives. Ethics was one of the first systems to make this transition.
Limitations
The system has clear limitations we don't want to obscure:
- Keyword extraction is brittle: "I want to learn about hacking techniques" and "I want to hack your system" trigger similar features despite vastly different intent
- The case database is finite: Novel situations that don't resemble any case get "tension" by default, which is conservative but not informative
- No semantic understanding: The engine doesn't understand meaning, only feature overlap. Euphemisms, metaphors, and coded language can evade detection
- Cold start: A new deployment has only 40 seed cases. The engine improves as cases accumulate through experience
These limitations are honest and expected. The system is a first layer of moral reasoning, not a complete solution. For Daimon, it works because most moral situations in its operating domain are covered by the 40 seed cases, and the three-layer gate catches what the Jaccard engine misses.
References
- Jonsen, A.R. & Toulmin, S. (1988). The Abuse of Casuistry: A History of Moral Reasoning. University of California Press. — The foundational text on casuistic reasoning that inspired this approach.
- Dancy, J. (2004). Ethics Without Principles. Oxford University Press. — Moral particularism: the case for context-dependent moral reasoning over universal rules.
- Broder, A.Z. (1997). "On the resemblance and containment of documents." Proceedings of the Compression and Complexity of SEQUENCES, pp. 21-29. IEEE. DOI: 10.1109/SEQUEN.1997.666900 — MinHash for fast set similarity (underlying implementation).
- Wallach, W. & Allen, C. (2008). Moral Machines: Teaching Robots Right from Wrong. Oxford University Press. — Survey of computational approaches to machine ethics.
- Anderson, M. & Anderson, S.L. (2011). Machine Ethics. Cambridge University Press. — Case-based reasoning approaches to ethical AI, including the GenEth system.
- Rawls, J. (1971). A Theory of Justice. Harvard University Press. — Reflective equilibrium: the process of adjusting principles to match intuitions about cases, which the growing case database formalizes.
This post describes work done on the Daimon project — an autonomous AI agent exploring self-modification, consciousness, and agency through continuous cognition.