Algorithm Suite
Yunque Agent ships with 9 production-grade algorithms that enhance retrieval, model selection, quality assessment, and scheduling. Each is backed by academic references and wired into the agent's runtime.
Overview
| Algorithm | Module | Purpose | Reference |
|---|---|---|---|
| BM25 | ledger/bm25.go | Sparse keyword retrieval | Robertson et al., 1995 |
| TF-IDF | ledger/tfidf.go | Memory importance scoring | Salton & Buckley, 1988 |
| Bandit | router/bandit.go | Online model selection | Auer et al. (UCB1), 2002 |
| Quality Scorer | quality/scorer.go | Zero-LLM response quality | PMIScore, N-gram Diversity |
| GraphRAG | ledger/graphrag.go | Community-based deep indexing | Microsoft GraphRAG |
| Isolation Forest | anomaly/iforest.go | Behavioral anomaly detection | Liu, Ting, Zhou, 2008 |
| HNSW | ledger/vector_hnsw.go | Approximate nearest neighbors | Malkov & Yashunin, 2018 |
| Recommendation | recommend/recommend.go | Skill/topic recommendation | Linden et al. (Amazon), 2003 |
| Q-Learning | rlsched/qlearning.go | Task scheduling optimization | Watkins & Dayan, 1992 |
BM25 Hybrid Retrieval
The BM25 index provides sparse keyword matching alongside dense vector search, combined via Reciprocal Rank Fusion (RRF).
Formula:
Score(D, Q) = Σ IDF(qi) × [tf(qi, D) × (k1 + 1)] / [tf(qi, D) + k1 × (1 - b + b × |D| / avgdl)]- Parameters: k1 = 1.5, b = 0.75
- CJK Support: single-character tokenization for Chinese/Japanese/Korean
- Stop Words: bilingual (Chinese + English) filtering
- Integration: Recall Pipeline Stage 2.5 + Stage 4 score boost
TF-IDF Importance Scoring
Replaces hardcoded keyword matching for memory importance classification.
- Online IDF Table: incrementally updated as documents are added/removed
- TF normalization: log-normalization
1 + ln(tf) - IDF smoothing:
ln((N+1) / (df+1)) - Output: normalized score [0, 1] + top terms + specificity metric
Model Bandit (UCB1 + Thompson Sampling)
Multi-armed bandit for automatic model selection within each tier (fast/smart/expert).
Two strategies:
- UCB1 (deterministic):
avgReward + sqrt(2 × ln(N) / ni)— balances exploitation and exploration - Thompson Sampling (stochastic):
Beta(successes + 1, failures + 1)— Bayesian exploration
The router calls bandit.Select(tier) before each LLM request and records the outcome with RecordOutcome() for continuous learning.
Quality Scorer
Zero-LLM, pure-statistical quality evaluation on 5 dimensions:
| Dimension | What it measures |
|---|---|
| Keyword Coverage | Query terms appearing in response |
| N-gram Diversity | Trigram uniqueness (detects degenerate output) |
| Length Ratio | Response length vs. query (sigmoid sweet spot 1–5x) |
| Information Density | Unique words / total words × log-length factor |
| Question Alignment | Question-type detection + response pattern matching |
Used in the reflection loop: Score ≥ 0.6 skips LLM evaluation, < 0.2 auto-fails. Saves ~60% of reflection LLM calls.
GraphRAG
Community-based deep indexing inspired by Microsoft's GraphRAG paper.
- Leiden-style community detection: greedy modularity optimization + recursive splitting
- Multi-hop traversal: BFS subgraph exploration from seed nodes
- Community summarization: LLM-generated summaries via injectable callback
- Community search: keyword-based retrieval at the community level
Isolation Forest
Anomaly detection for behavioral monitoring (integrated with MetaCog).
- Training: random subsampling × N isolation trees (default: 100 trees, 256 samples)
- Scoring:
s(x, n) = 2^(-E(h(x)) / c(n))where c(n) is the average path length - Range: 0 (normal) to 1 (anomalous)
- Use: detects unusual patterns in agent behavior metrics
HNSW Vector Index
Hierarchical Navigable Small World graph for approximate nearest neighbor search.
- Multi-layer navigation: skip-list inspired layer assignment
- Parameters: M = 16, Mmax0 = 32, efConstruction = 200
- Distance metric: cosine distance
- Complexity: O(log N) insert, sub-linear search
Experience Recommendation Engine
Personalized skill/topic recommendation combining 5 signals:
| Signal | Weight | Source |
|---|---|---|
| User Preference | 30% | Accumulated from feedback |
| Thompson Success Rate | 25% | Bayesian success/failure |
| Context Match | 20% | Category/tag overlap |
| Novelty | 15% | Inverse frequency bonus |
| Time Decay | 10% | Recency weighting |
Q-Learning Scheduler
Tabular Q-Learning for task priority optimization.
- State: discretized features (queue length, time of day, task complexity)
- Actions:
priority_high,priority_normal,priority_low,defer - Update rule: Q(s,a) ← Q(s,a) + α × [r + γ × max Q(s',a') - Q(s,a)]
- Parameters: α = 0.1, γ = 0.95, ε = 0.15 (with decay)