Algorithm Suite

Yunque Agent ships with 9 production-grade algorithms that enhance retrieval, model selection, quality assessment, and scheduling. Each is backed by academic references and wired into the agent's runtime.

Overview

Algorithm	Module	Purpose	Reference
BM25	`ledger/bm25.go`	Sparse keyword retrieval	Robertson et al., 1995
TF-IDF	`ledger/tfidf.go`	Memory importance scoring	Salton & Buckley, 1988
Bandit	`router/bandit.go`	Online model selection	Auer et al. (UCB1), 2002
Quality Scorer	`quality/scorer.go`	Zero-LLM response quality	PMIScore, N-gram Diversity
GraphRAG	`ledger/graphrag.go`	Community-based deep indexing	Microsoft GraphRAG
Isolation Forest	`anomaly/iforest.go`	Behavioral anomaly detection	Liu, Ting, Zhou, 2008
HNSW	`ledger/vector_hnsw.go`	Approximate nearest neighbors	Malkov & Yashunin, 2018
Recommendation	`recommend/recommend.go`	Skill/topic recommendation	Linden et al. (Amazon), 2003
Q-Learning	`rlsched/qlearning.go`	Task scheduling optimization	Watkins & Dayan, 1992

BM25 Hybrid Retrieval

The BM25 index provides sparse keyword matching alongside dense vector search, combined via Reciprocal Rank Fusion (RRF).

Formula:

Score(D, Q) = Σ IDF(qi) × [tf(qi, D) × (k1 + 1)] / [tf(qi, D) + k1 × (1 - b + b × |D| / avgdl)]

Parameters: k1 = 1.5, b = 0.75
CJK Support: single-character tokenization for Chinese/Japanese/Korean
Stop Words: bilingual (Chinese + English) filtering
Integration: Recall Pipeline Stage 2.5 + Stage 4 score boost

TF-IDF Importance Scoring

Replaces hardcoded keyword matching for memory importance classification.

Online IDF Table: incrementally updated as documents are added/removed
TF normalization: log-normalization 1 + ln(tf)
IDF smoothing: ln((N+1) / (df+1))
Output: normalized score [0, 1] + top terms + specificity metric

Model Bandit (UCB1 + Thompson Sampling)

Multi-armed bandit for automatic model selection within each tier (fast/smart/expert).

Two strategies:

UCB1 (deterministic): avgReward + sqrt(2 × ln(N) / ni) — balances exploitation and exploration
Thompson Sampling (stochastic): Beta(successes + 1, failures + 1) — Bayesian exploration

The router calls bandit.Select(tier) before each LLM request and records the outcome with RecordOutcome() for continuous learning.

Quality Scorer

Zero-LLM, pure-statistical quality evaluation on 5 dimensions:

Dimension	What it measures
Keyword Coverage	Query terms appearing in response
N-gram Diversity	Trigram uniqueness (detects degenerate output)
Length Ratio	Response length vs. query (sigmoid sweet spot 1–5x)
Information Density	Unique words / total words × log-length factor
Question Alignment	Question-type detection + response pattern matching

Used in the reflection loop: Score ≥ 0.6 skips LLM evaluation, < 0.2 auto-fails. Saves ~60% of reflection LLM calls.

GraphRAG

Community-based deep indexing inspired by Microsoft's GraphRAG paper.

Leiden-style community detection: greedy modularity optimization + recursive splitting
Multi-hop traversal: BFS subgraph exploration from seed nodes
Community summarization: LLM-generated summaries via injectable callback
Community search: keyword-based retrieval at the community level

Isolation Forest

Anomaly detection for behavioral monitoring (integrated with MetaCog).

Training: random subsampling × N isolation trees (default: 100 trees, 256 samples)
Scoring: s(x, n) = 2^(-E(h(x)) / c(n)) where c(n) is the average path length
Range: 0 (normal) to 1 (anomalous)
Use: detects unusual patterns in agent behavior metrics

HNSW Vector Index

Hierarchical Navigable Small World graph for approximate nearest neighbor search.

Multi-layer navigation: skip-list inspired layer assignment
Parameters: M = 16, Mmax0 = 32, efConstruction = 200
Distance metric: cosine distance
Complexity: O(log N) insert, sub-linear search

Experience Recommendation Engine

Personalized skill/topic recommendation combining 5 signals:

Signal	Weight	Source
User Preference	30%	Accumulated from feedback
Thompson Success Rate	25%	Bayesian success/failure
Context Match	20%	Category/tag overlap
Novelty	15%	Inverse frequency bonus
Time Decay	10%	Recency weighting

Q-Learning Scheduler

Tabular Q-Learning for task priority optimization.

State: discretized features (queue length, time of day, task complexity)
Actions: priority_high, priority_normal, priority_low, defer
Update rule: Q(s,a) ← Q(s,a) + α × [r + γ × max Q(s',a') - Q(s,a)]
Parameters: α = 0.1, γ = 0.95, ε = 0.15 (with decay)

Algorithm Suite ​

Overview ​

BM25 Hybrid Retrieval ​

TF-IDF Importance Scoring ​

Model Bandit (UCB1 + Thompson Sampling) ​

Quality Scorer ​

GraphRAG ​

Isolation Forest ​

HNSW Vector Index ​

Experience Recommendation Engine ​

Q-Learning Scheduler ​