Skip to content

Algorithm Suite

Yunque Agent ships with 9 production-grade algorithms that enhance retrieval, model selection, quality assessment, and scheduling. Each is backed by academic references and wired into the agent's runtime.

Overview

AlgorithmModulePurposeReference
BM25ledger/bm25.goSparse keyword retrievalRobertson et al., 1995
TF-IDFledger/tfidf.goMemory importance scoringSalton & Buckley, 1988
Banditrouter/bandit.goOnline model selectionAuer et al. (UCB1), 2002
Quality Scorerquality/scorer.goZero-LLM response qualityPMIScore, N-gram Diversity
GraphRAGledger/graphrag.goCommunity-based deep indexingMicrosoft GraphRAG
Isolation Forestanomaly/iforest.goBehavioral anomaly detectionLiu, Ting, Zhou, 2008
HNSWledger/vector_hnsw.goApproximate nearest neighborsMalkov & Yashunin, 2018
Recommendationrecommend/recommend.goSkill/topic recommendationLinden et al. (Amazon), 2003
Q-Learningrlsched/qlearning.goTask scheduling optimizationWatkins & Dayan, 1992

BM25 Hybrid Retrieval

The BM25 index provides sparse keyword matching alongside dense vector search, combined via Reciprocal Rank Fusion (RRF).

Formula:

Score(D, Q) = Σ IDF(qi) × [tf(qi, D) × (k1 + 1)] / [tf(qi, D) + k1 × (1 - b + b × |D| / avgdl)]
  • Parameters: k1 = 1.5, b = 0.75
  • CJK Support: single-character tokenization for Chinese/Japanese/Korean
  • Stop Words: bilingual (Chinese + English) filtering
  • Integration: Recall Pipeline Stage 2.5 + Stage 4 score boost

TF-IDF Importance Scoring

Replaces hardcoded keyword matching for memory importance classification.

  • Online IDF Table: incrementally updated as documents are added/removed
  • TF normalization: log-normalization 1 + ln(tf)
  • IDF smoothing: ln((N+1) / (df+1))
  • Output: normalized score [0, 1] + top terms + specificity metric

Model Bandit (UCB1 + Thompson Sampling)

Multi-armed bandit for automatic model selection within each tier (fast/smart/expert).

Two strategies:

  1. UCB1 (deterministic): avgReward + sqrt(2 × ln(N) / ni) — balances exploitation and exploration
  2. Thompson Sampling (stochastic): Beta(successes + 1, failures + 1) — Bayesian exploration

The router calls bandit.Select(tier) before each LLM request and records the outcome with RecordOutcome() for continuous learning.

Quality Scorer

Zero-LLM, pure-statistical quality evaluation on 5 dimensions:

DimensionWhat it measures
Keyword CoverageQuery terms appearing in response
N-gram DiversityTrigram uniqueness (detects degenerate output)
Length RatioResponse length vs. query (sigmoid sweet spot 1–5x)
Information DensityUnique words / total words × log-length factor
Question AlignmentQuestion-type detection + response pattern matching

Used in the reflection loop: Score ≥ 0.6 skips LLM evaluation, < 0.2 auto-fails. Saves ~60% of reflection LLM calls.

GraphRAG

Community-based deep indexing inspired by Microsoft's GraphRAG paper.

  • Leiden-style community detection: greedy modularity optimization + recursive splitting
  • Multi-hop traversal: BFS subgraph exploration from seed nodes
  • Community summarization: LLM-generated summaries via injectable callback
  • Community search: keyword-based retrieval at the community level

Isolation Forest

Anomaly detection for behavioral monitoring (integrated with MetaCog).

  • Training: random subsampling × N isolation trees (default: 100 trees, 256 samples)
  • Scoring: s(x, n) = 2^(-E(h(x)) / c(n)) where c(n) is the average path length
  • Range: 0 (normal) to 1 (anomalous)
  • Use: detects unusual patterns in agent behavior metrics

HNSW Vector Index

Hierarchical Navigable Small World graph for approximate nearest neighbor search.

  • Multi-layer navigation: skip-list inspired layer assignment
  • Parameters: M = 16, Mmax0 = 32, efConstruction = 200
  • Distance metric: cosine distance
  • Complexity: O(log N) insert, sub-linear search

Experience Recommendation Engine

Personalized skill/topic recommendation combining 5 signals:

SignalWeightSource
User Preference30%Accumulated from feedback
Thompson Success Rate25%Bayesian success/failure
Context Match20%Category/tag overlap
Novelty15%Inverse frequency bonus
Time Decay10%Recency weighting

Q-Learning Scheduler

Tabular Q-Learning for task priority optimization.

  • State: discretized features (queue length, time of day, task complexity)
  • Actions: priority_high, priority_normal, priority_low, defer
  • Update rule: Q(s,a) ← Q(s,a) + α × [r + γ × max Q(s',a') - Q(s,a)]
  • Parameters: α = 0.1, γ = 0.95, ε = 0.15 (with decay)

© 2025 云鸢科技(青岛)有限公司 × Dream Lab