Expert Systems Using the Rete Algorithm
For decades, rule-based expert systems stood at the forefront of Artificial Intelligence. While deep learning models excel at recognition, perception, and approximate reasoning, rule-based systems remain the gold standard for applications requiring deterministic reasoning, absolute compliance, explicit logic, and high interpretability.
At the heart of the most powerful forward-chaining expert systems is the Rete Algorithm, designed by Charles Forgy in 1974. Rete solves a fundamental scaling challenge: in a naive production system, evaluating $R$ rules against $W$ working memory elements requires $O(R \times W)$ checks on every cycle. As working memory or the rule base grows, performance collapses. Rete achieves high efficiency by compiling rules into a dataflow network that evaluates conditions incrementally, trading computer memory (to cache partial matches) for execution speed.
This chapter walks through a modern, lightweight, and idiomatic Python implementation of the Rete algorithm. We will discuss its design, examine some of the trickiest details in Rete implementation, and analyze six complete, real-world case studies to offer practical advice on designing production-grade expert systems.
All code for this chapter is located in the source-code/rete-algorithm directory.
1. Rete Engine Architecture & Design
Our implementation compiles declarative rules into a unified dataflow graph containing two main sections: the Alpha Network and the Beta Network.
The Working Memory (WM)
Working memory stores active assertions, represented as Working Memory Elements (WMEs). In source-code/rete-algorithm/rete/facts.py, user facts are defined as frozen Python @dataclass objects inheriting from Fact. This gives us hashability and immutability out of the box.
When a fact is asserted, the WorkingMemory wraps it in a WME container with a unique, monotonically increasing id and a timestamp indicating its recency.
The Alpha Network (Intra-Fact Filtering)
The Alpha Network performs single-fact filtering (e.g., checking if a patient’s temperature is greater than 38.5).
- Type Dispatch: The root
AlphaNetworkdirects incoming WMEs directly to branches corresponding to their specific Python class type using an $O(1)$ type lookup dictionary. - Alpha Test Chains: A chain of
AlphaTestNodegates filters the fact’s fields against constant criteria. - Node Sharing: If multiple rules check the exact same condition (e.g.,
temperature > 38.5), the network shares the corresponding alpha test nodes and routes them to a single sharedAlphaMemorynode, avoiding redundant checks.
The Beta Network (Inter-Fact Joining)
The Beta Network manages multi-fact relations, checking for variable consistency across different conditions (e.g., joining Patient(name=Var("n")) with Diagnosis(patient=Var("n")) where both must refer to the same patient name).
- Tokens: A
Tokenrepresents a partial match flowing down the beta network. It is structured as a linked list chain pointing to a parent token and a matched WME, allowing memory-efficient prefix sharing. - Join Nodes:
JoinNodeinstances cross-reference incoming tokens from their left input (Beta Memory) with WMEs from their right input (Alpha Memory) to verify variable bindings. - Negative Nodes:
NegativeNodeinstances implement Negated Condition Elements (NCEs), allowing rules to match only in the absence of matching facts. - Production Nodes: Terminal
ProductionNodeinstances represent fully matched rules. When a token propagates here, it constructs a ruleInstantiationand adds it to the conflict set.
Conflict Resolution & Recognizing Cycles
On each cycle, the ReteEngine selects one rule instantiation to fire from the conflict set. It resolves conflicts using:
- Refraction: Ensures that a specific combination of facts fires a rule at most once, preventing infinite execution loops on stable data.
- Salience: A developer-defined priority number (higher priority rules fire first).
- Recency (LEX/MEA): Prefers instantiations matching the most recently asserted facts (highest timestamps) to focus reasoning on current events.
2. Implementing the Trickier Parts of Rete
Writing a Rete network reveals several subtle challenges. Below, we examine three of the most complex implementation details in our engine.
2.1 The Retraction Cleanup & Token Descendant Bug
When a fact is retracted, the change must propagate down the entire Rete network to remove invalid partial matches. Naive Rete implementations often clean up memory lists checking t is not token. This fails because as a token flows through JoinNode instances, it is extended to a new child token object containing the newly joined WME.
To fix this, we implemented a robust ancestor-descendant lookup function is_descendant that traverses parent pointers up the token chain:
1 def is_descendant(t: Token, token: Token) -> bool:
2 """Return True if *t* is a descendant of *token* (or is *token* itself)."""
3 curr: Token | None = t
4 while curr is not None:
5 if curr is token:
6 return True
7 curr = curr.parent
8 return False
Using this helper, any memory node (BetaMemory, NegativeNode tables, or ProductionNode sets) can confidently purge matches downstream:
1 def left_remove_token(self, token: Token) -> None:
2 """Remove all tokens that are descendants of *token* and propagate."""
3 surviving = [t for t in self.tokens if not is_descendant(t, token)]
4 self.tokens = surviving
5 for child in self.children:
6 child.left_remove_token(token)
2.2 Negated Condition Elements (NCEs)
Implementing negative matching (e.g., ~Pat(Diagnosis, patient=Var("n"))) requires tracking blocker relationships inside NegativeNode.
- A token is blocked if one or more WMEs in the alpha memory satisfy the join constraints.
- The node maintains a
_blockedtable mapping tokens to their blocking WMEs, and a_passedlist for tokens with zero blockers. - If a new WME arrives and matches a passed token, that token is retracted from downstream nodes and moved to the blocked table.
- If a blocking WME is retracted, the node checks if the token has any remaining blockers. If its blockers count hits zero, the token is released downstream.
2.3 Deferred Working Memory Mutations
Rule actions (RHS) frequently assert, retract, or modify facts. If these operations propagate immediately during RHS execution, they can alter the Rete network state mid-execution, leading to recursive corruption.
We resolve this by introducing RuleContext, which queues mutations inside deferred lists:
1 class RuleContext:
2 def assert_fact(self, fact: Fact) -> None:
3 self._pending_asserts.append(fact)
The ReteEngine runs the action, lets it return, and then replays the mutations in a safe, atomic sequence (retractions first, then assertions) to update the network cleanly.
3. Case Studies & Design Advice
We now examine six case studies showcasing the engine in action, highlighting how to write and structure rules.
3.1 Medical Diagnosis (Forward Chaining & Negation)
The medical diagnosis expert system (source-code/rete-algorithm/example_medical.py) implements standard clinical decision support.
1 @engine.rule(
2 Pat(Patient, name=Var("n"), temperature=Gt(38.5), symptoms=Contains("cough")),
3 ~Pat(Diagnosis, patient=Var("n"), condition=Eq("flu")),
4 salience=10
5 )
6 def diagnose_flu(ctx, n):
7 ctx.assert_fact(Diagnosis(patient=n, condition="flu"))
Explanation & Design Advice
This case study uses forward chaining to derive high-level conclusions (Diagnosis) from raw measurements (Patient).
- Variable Joins: The variable
Var("n")binds the patient’s name in the first pattern and enforces that the negative condition checks for a flu diagnosis for that specific patient. - NCE Guards: The negative pattern
~Pat(...)serves as a guard. It prevents the rule from firing repeatedly and generating redundant diagnoses. - System Design Lesson: Always use negated patterns to check for the existence of your inferred facts before asserting them, avoiding cycle bloat.
3.2 Smart Home Automation & Sensor Fusion
The smart home coordinator (source-code/rete-algorithm/example_smart_home.py) handles device automation based on occupancy, temperature, and light sensors.
1 @engine.rule(
2 Pat(Occupancy, room=Var("r"), status=Eq("occupied")),
3 Pat(SensorReading, room=Var("r"), type=Eq("light"), value=Lt(100.0)),
4 Pat(DeviceStatus, device_id=Var("d"), room=Var("r"), type=Eq("light"), state=Eq("off")),
5 ~Pat(HouseMode, mode=Eq("away")),
6 )
7 def turn_on_lights(ctx, r, d):
8 ctx.modify(d, state="on")
Explanation & Design Advice
This example highlights a critical rule system hazard: infinite rule loops.
In our first iteration, when the house switched to "away" mode, the auto_shutoff_away rule turned off all active devices. However, because the room was occupied and dark, the turn_on_lights rule immediately reactivated them, creating a continuous loop of turning devices on and off.
- The Solution: We added the negated condition
~Pat(HouseMode, mode=Eq("away"))to block the automation rules when the house is vacant. - System Design Lesson: When writing rules that perform mutations, always map out their interactions. If Rule A changes a state that Rule B reacts to, ensure they have explicit state guards (such as active modes or status flags) to establish a clear hierarchy.
3.3 Network Intrusion Detection & Threat Alerting
The network intrusion detection system (source-code/rete-algorithm/example_network_security.py) monitors connection attempts and log events.
1 @engine.rule(
2 Pat(LoginAttempt, ip=Var("ip"), result=Eq("fail")),
3 Pat(FailedLoginCounter, ip=Var("ip"), count=Var("c")),
4 )
5 def increment_failed_login_counter(ctx, ip, c):
6 for wme in ctx.token_wmes:
7 if isinstance(wme.fact, FailedLoginCounter):
8 ctx.modify(wme, count=c + 1)
9 elif isinstance(wme.fact, LoginAttempt):
10 ctx.retract(wme) # Consume the event
Explanation & Design Advice
Network systems must process a high frequency of raw incoming events (LoginAttempt). If we simply asserted every event into memory, the Rete network would suffer from a combinatorial explosion of joins.
- Event Consumption Pattern: Rather than retaining raw events, the rules consume them. When a login fails, we modify the persistent
FailedLoginCounterand immediately retract the rawLoginAttemptevent. - Stateful Aggregates: High-level rules then reason about the consolidated counters (e.g., blocking the IP if
count > 3). - System Design Lesson: Categorize your facts into events (temporary, trigger rules, and then retracted) and states (persistent summaries updated by rules). This keeps working memory small and ensures the Rete network remains highly responsive.
3.4 E-Commerce Pricing & Discount Engine
The pricing system (source-code/rete-algorithm/example_ecom_pricing.py) calculates orders by combining loyalty tier discounts, bulk order price drops, and seasonal promotional campaigns.
1 @engine.rule(
2 Pat(CartSummary, customer_id=Var("c_id"), subtotal=Var("sub"), total_discount=Var("td")),
3 ~Pat(CartItem, customer_id=Var("c_id"), processed=Eq(False)),
4 ~Pat(AppliedDiscount, customer_id=Var("c_id"), processed=Eq(False)),
5 ~Pat(Invoice, customer_id=Var("c_id")),
6 )
7 def create_invoice(ctx, c_id, sub, td):
8 ctx.assert_fact(Invoice(customer_id=c_id, subtotal=sub, discount=td, total=sub-td))
Explanation & Design Advice
This case study demonstrates the Barrier Completion pattern. Generating an invoice must happen only after all cart items are totaled and all eligible discounts are fully applied.
- NCE Barriers: We declare negated patterns checking for any
CartItemorAppliedDiscountthat hasprocessed=False. - Execution Phases: The Rete engine processes individual item accumulations incrementally. Only when the count of unprocessed items hits zero does the
create_invoicerule satisfy its negative conditions and execute. - System Design Lesson: Use negated attributes as synchronization barriers to structure multi-phase workflows in an otherwise asynchronous rule evaluation cycle.
3.5 Financial Portfolio Rebalancing
The portfolio advisor (source-code/rete-algorithm/example_portfolio_rebalance.py) monitors asset values and recommends buy/sell trades to align portfolios with target percentages.
1 @engine.rule(
2 Pat(PortfolioSummary, portfolio_id=Var("p_id"), total_value=Var("tot")),
3 Pat(AssetAllocation, portfolio_id=Var("p_id"), asset_class=Var("ac"), current_value=Var("curr")),
4 Pat(TargetAllocation, portfolio_id=Var("p_id"), asset_class=Var("ac"), target_pct=Var("t_pct")),
5 ~Pat(AssetAllocation, portfolio_id=Var("p_id"), processed=Eq(False)),
6 ~Pat(RebalanceTrade, portfolio_id=Var("p_id"), asset_class=Var("ac")),
7 )
8 def check_drift_and_rebalance(ctx, p_id, ac, curr, t_pct, tot):
9 current_pct = curr / tot
10 drift = current_pct - t_pct
11 if abs(drift) > 0.05:
12 ctx.assert_fact(RebalanceTrade(portfolio_id=p_id, asset_class=ac, action="sell" if drift > 0 else "buy", ...))
Explanation & Design Advice
This case study highlights the balance between LHS matching and RHS execution logic.
- Coarse-Grained LHS: We use the Rete LHS to match the relevant holdings, targets, and total portfolio sums.
- Fine-Grained RHS Math: Rather than trying to express complex mathematical functions (like absolute percentage deviations) directly in the declarative patterns, we evaluate them using Python code in the RHS action. If the threshold is crossed, we assert the trade recommendation.
- System Design Lesson: Keep Rete patterns focused on structural matching and variable binding. Offload complex mathematical equations and multi-variable threshold calculations to the RHS action to keep rules maintainable and performant.
3.6 Hospital Patient Triage & Resource Allocation
The hospital triage scheduler (source-code/rete-algorithm/example_hospital_triage.py) routes arriving patients to empty beds and available staff.
1 @engine.rule(
2 Pat(Patient, id=Var("p_id"), severity=Gt(3)),
3 Pat(PatientStatus, patient_id=Var("p_id"), status=Eq("waiting")),
4 Pat(Bed, id=Var("b_id"), occupied=Eq(False)),
5 Pat(Staff, id=Var("s_id"), role=Eq("doctor"), busy=Eq(False)),
6 salience=10,
7 )
8 def triage_critical_patient(ctx, p_id, b_id, s_id):
9 ctx.assert_fact(Assignment(patient_id=p_id, bed_id=b_id, staff_id=s_id))
10 # Mark resources as occupied/busy...
Explanation & Design Advice
This scheduler resolves a classic resource allocation problem where multiple resources must be matched dynamically under priority constraints.
- Salience Priorities: By setting
salience=10on the critical triage rule andsalience=5on the standard triage rule, the engine guarantees that severe cases are assigned first. - Dynamic Resource Releasing: A lower salience rule (
salience=1) processes treatment completion, retracting assignments and marking beds and staff as available. Once resources are released, the high-salience rules immediately reactivate on the remaining patient queue. - System Design Lesson: Use salience systematically to establish scheduling priorities, and design resource-release cycles to allow continuous, incremental optimization of resource allocations.
Wrap Up
Why Use Embedded Expert Systems?
Even in an era dominated by large language models and deep neural networks, small, embedded expert systems using algorithms like Rete remain highly relevant for specific design constraints:
- Verifiability and Safety: Expert systems are fully deterministic. Unlike LLMs, they do not suffer from “hallucinations,” and they provide an explicit execution audit trail. This makes them ideal for mission-critical tasks like medical triage, security filtering, or automated financial trading where logic must be mathematically verifiable.
- Speed and Efficiency: Rete compiles rule logic into an optimized, in-memory dataflow network. A local Rete evaluation cycle executes in fractions of a millisecond with zero API latency, zero token costs, and negligible compute overhead, making it perfectly suited for real-time edge environments (like home automation and IoT devices).
- Decoupled Business Logic: Expert systems allow developers to write rules in a highly declarative format. Business logic is separated from database routing or application control flow, allowing rules to be modified, extended, or audited independently.
When to Use Other Techniques
While expert systems are powerful, they are not a silver bullet in the modern age of strong AI implemented with LLMs. You should consider alternative approaches when dealing with other problem shapes:
- Deep Learning and Machine Learning: For perception tasks (image classification, audio recognition) or statistical prediction (forecasting churn, estimating house values) where logic cannot be easily written as binary rules, machine learning is essential. Neural networks excel at mapping messy, high-dimensional inputs to predictions.
- Large Language Models (LLMs): When dealing with unstructured natural language, semantic reasoning, or open-ended text synthesis (like draft generation or user support), LLMs are the superior choice.
- Constraint Satisfaction Solvers (e.g., MiniZinc): When a scheduling or layout problem is highly constrained and requires search optimization over a huge space of combinations (like creating a school timetable or routing delivery trucks), a dedicated Constraint Satisfaction solver is far more efficient than writing a network of heuristic forward-chaining rules.
- Graph Databases and Semantic Web (SPARQL/RDF): If your domain model requires querying vast, richly interconnected networks of relationship data rather than running active production loops, graph databases are much cleaner and more performant than loading millions of facts into a Rete engine.
The Rete Algorithm scales poorly (order of N^2 where N is working memory size) for large working memories. The Rete Algorithm is optimized for large numbers of rules. In the modern age of LLMs, you can use coding agents help write and maintain large rule sets.
Optional Practice Problems
To solidify your understanding of the Rete algorithm, forward-chaining rule engines, and rule-based system design patterns, try implementing the following practice exercises. All files mentioned are located in the rete-algorithm directory.
Problem 1 (Easy): Diagnosis and Treatment for the Common Cold
Objective: Learn how to register new rules, define new conditions using comparison operators, and use Negated Condition Elements (NCEs) as guards.
In example_medical.py, the diagnosis system identifies cases of flu and generic fever. Extend this system to handle the Common Cold by doing the following:
- Define a rule
diagnose_coldthat fires when aPatienthas a temperature between37.0°C and38.0°C (inclusive, usingGe(37.0)andLe(38.0)) and has the symptom"congestion"or"sore throat". - Use an NCE guard
~Pat(Diagnosis, patient=Var("n"), condition=Eq("cold"))to prevent asserting duplicate diagnoses. - Define a rule
treat_coldthat fires when a patient has acolddiagnosis but no existingTreatmentfact asserted. It should assert aTreatmentaction"prescribe rest and hydration". - Add a test patient, e.g.,
Patient(name="Dave", temperature=37.5, symptoms=("congestion", "headache")), run the engine, and verify that the rules fire and output the correct diagnosis and treatment.
Problem 2 (Medium): Smart Home Safety Override and Engine Halting
Objective: Understand how to implement safety overrides, use rule priority (salience), and trigger immediate engine halting.
In example_smart_home.py, automation rules turn lights and heating on or off. However, in an emergency (like smoke or carbon monoxide detection), home automation must be overridden by a safety protocol.
-
Define a new fact type:
1 @dataclass(frozen=True) 2 class SafetyAlert(Fact): 3 hazard_type: str # e.g., "smoke", "carbon_monoxide"
- Write a high-priority rule (
salience=100) that triggers if aSensorReadingof type"smoke"or"gas"has a value of"detected". This rule should assert a correspondingSafetyAlertfact. -
Write a safety override rule (
salience=90) that triggers when aSafetyAlertfact is present. This rule must:- Modify any active
DeviceStatusof type"hvac"to"off"(to prevent circulating smoke/gas). - Modify any active
DeviceStatusof type"light"to"on"(to illuminate escape routes), regardless of whether the house mode is set to"away". - Call
ctx.halt()to stop the engine’s recognize-act cycle immediately, preventing further rule execution.
- Modify the execution block to assert a smoke sensor reading:
SensorReading(sensor_id="smoke_1", room="hallway", type="smoke", value="detected"). Run the engine and verify that the safety override takes precedence, adjusts all devices to safety states, and halts the engine before other low-priority rules execute.
Problem 3 (Hard): Cash-Constrained Portfolio Rebalancing (Multi-Phase Execution)
Objective: Coordinate multi-phase workflows with dependencies, enforce resource constraints, and perform state modifications in the rule actions (RHS).
In example_portfolio_rebalance.py, the engine recommends trades as RebalanceTrade facts but does not execute them. In real-world trading, you cannot purchase new assets without sufficient cash. Implement a cash-constrained trade execution model:
- Define rules to coordinate a two-phase rebalancing process: Phase 1 (Sells) followed by Phase 2 (Buys).
-
Write an execution rule
execute_sell_tradethat matches aRebalanceTradewithaction="sell". The rule should:- Modify the sold asset’s
AssetAllocationto subtract the trade amount. - Modify the
CASHasset’sAssetAllocationto add the cash generated from the sale. - Retract the
RebalanceTradefact.
-
Write an execution rule
execute_buy_tradethat matches aRebalanceTradewithaction="buy". To ensure cash is fully accounted for first, this rule must have a negated condition element preventing it from firing if any sell recommendations are still pending:~Pat(RebalanceTrade, action=Eq("sell"))
-
The
execute_buy_traderule must retrieve the current balance of theCASHallocation.- If cash is sufficient, it should execute the full buy (modify the asset allocation, subtract cash, and retract the trade).
- If cash is insufficient, it should scale down the buy to the remaining cash, perform the partial execution, update the cash to zero, and retract the trade (or cancel it with a warning).
- Set up a test portfolio where a large holding drift (e.g., excess
US_EQUITIES) must be sold first to generate the cash required to execute a pending buy (e.g., inBONDS). Run the engine and print the step-by-step transaction logs and the final asset allocations to confirm that cash limits were strictly respected.