← All Articles

Unirec (Pluggable Recommendation Stack) — WIP

Posted on

unirec — Building a Pluggable, RL-Friendly Recommendation Stack (WIP)

Last updated: Oct 22, 2025 (KST)

TL;DR

unirec is a modular recommender framework aimed at RL-friendly ranking, config-driven pipelines, and production-grade retrieval.

Key ideas

  • Plugin architecture: Retriever → Merger → Shaper → SlatePolicy → Evaluator/OPE, all swappable via YAML.
  • Typed interfaces: Encodable[TContext], Encoder[TContext], Encoded[TContext] with Profile/Session/Request contexts.
  • Deterministic caching & reproducibility: explicit fingerprints (class FQN & version) + canonicalization for NumPy/Torch values.
  • Retrieval @ scale: two-tower embeddings + FAISS; snapshot A/B or delta overlay; user vectors from profile/session.
  • RL-ready ranking: bandit UCB or slate-bandit, diversity shaping (MMR), and OPE (IPS/SNIPS/DR + 95% CI).

This is a work in progress; APIs are stabilizing, but the principles below are what we’re converging on.


Why another stack?

  1. Separable concerns — iterate on retrieval and ranking independently.
  2. Reproducibility — pipelines must be replayable offline; cache keys should encode what produced which vector.
  3. Practical RL — ranking benefits from bandits/RL; retrieval must stay fast and robust.

Architecture at a glance

[ CandidateRetriever ] —(pools)—> [ CandidateMerger ] —(set)—> [ CandidateShaper ] | FAISS / TwoTower | MMR / filters v v logs CandidateSet | v [ SlatePolicy (bandit/RL) ] | v PolicyOutput
v [ Evaluator ] [ OPEEstimator ]

Each component subclasses Component and implements a tight contract:

  • CandidateRetriever: search_one(state, k) -> list[Candidate]
  • CandidateMerger: merge(pools, user_id, topk) -> CandidateSet
  • CandidateShaper: shape(state) -> CandidateSet
  • SlatePolicy: select_slate(state) -> PolicyOutput
  • Evaluator/OPEEstimator: evaluate/estimate(state) -> dict

component_kind is a class constant (e.g., "candidate_retriever") and pipelines are assembled by YAML.


Context, Encodable, Encoder, Encoded

We separate what you encode from how you encode it.

  • Context: capability boundary with two horizons + optional request:

    • Profile[TContext]: stable identity & metadata (e.g., user id; item id; long-term features)
    • Session[TContext]: short-term signals (recent interactions, recency windows)
    • Request: per-request metadata (page, surface, A/B cell). Optional by design.
  • Encodable[TContext]: holds (profile, session, meta) and carries vector caches.
  • Encoder[TContext]: turns an Encodable[TContext] (+ optional Request) into an Encoded[TContext].
  • Encoded[TContext]: wraps the produced vector, delegates profile/session/meta to the origin Encodable, and carries optional Request.

This keeps retrieval fast (reusing cached vectors when keys match) and ranking flexible (request-aware when needed).

Minimal interface sketch

class Context(Versioned): ...

TContext = TypeVar("TContext", bound=Context)

class Profile(Context, Generic[TContext]):
    @property
    def id(self) -> int: ...

class Session(Context, Generic[TContext]): ...
class Request(Context): ...

class Encodable(Versioned, Generic[TContext]):
    def __init__(self, profile: Profile[TContext], session: Session[TContext], meta: Mapping[str, Any] | None = None): ...
    # holds vector caches keyed by (encoder_key, request_key)

class Encoded(Versioned, Generic[TContext]):
    def __init__(self, vector: Tensor, origin: Encodable[TContext], request: Request | None = None): ...
    @property
    def profile(self) -> Profile[TContext]: ...
    @property
    def session(self) -> Session[TContext]: ...

class Encoder(Versioned, Generic[TContext]):
    @abstractmethod
    def encode(self, encodable: Encodable[TContext], *, request: Request | None = None, **kw) -> Encoded[TContext]: ...

Why optional Request?

Retrieval should remain light and mostly request-agnostic; ranking may use request signals. Optionality simplifies caching and reproducibility.


Fingerprints & Versioning (repro first-class)

We attach a fingerprinter to generate canonical cache keys for encodings.

  • Head fields (always first): __fqn__ (class FQN), __hash_algo__, __hash_len__, __fingerprint_ver__, and optional __cls_version__ when inheriting Versioned.
  • Canonicalization:

    • floats: NaN/Inf/−0.0 normalized, fixed decimals
    • NumPy arrays: (shape, dtype, mean, var)
    • Torch tensors: CPU float32; compute (mean, var) with NaN/Inf handling (fallback when nanvar missing)
    • datetime → UTC ISO, UUID → hex, bytes → base64
    • mappings/sets/lists → stable ordering
  • Output: stable JSON of sorted (key, value) pairs, plus a digest (blake2s, configurable length).

Why head fields? Keys immediately disambiguate who produced them (FQN + version) before payload features.

Cache usage

  • enc_key = fingerprinter.key(encoder)
  • req_key = "∅" (if no request) or fingerprinter.key(request)
  • Encodable keeps:

    • vec_profile_cached[enc_key]
    • vec_session_cached[enc_key]
    • vec_enc_cached[(enc_key, req_key)]

Retrieval: Two-Tower + FAISS

  • Item tower: batch-trained in PyTorch, snapshots exported to FAISS. Index choices are pluggable (IVF/HNSW/PQ). Refresh via A/B snapshot, rolling, or delta overlay.
  • User tower: built from Profile (long-term) + Session (recent). Retrieval uses a light user vector; heavier user models are reserved for ranking.
  • Freshness gating / biasing: at retrieval, we don’t re-infer heavy features; we add cheap scalar biases (e.g., time decay) or hard filters. Save rich signals for ranking.

Ranking: Bandits & RL

  • Default: sequential slate policy with UCB-style scoring (position weights × residual picks × diversity penalty MMR).
  • RL-ready: treat ranking as a slate bandit (stateful, position-aware), not independent slots. PPO-style policies can come later; bandits get you far with simpler infra.
  • Metrics: NDCG / Recall / ILD / Entropy / Coverage. OPE: IPS / SNIPS / Doubly-Robust with paired bootstrap for 95% CI.

What’s intentionally not in retrieval

  • Re-running heavy MLPs for per-request features
  • Online fine-tuning of item vectors inside the hot path
  • O(n²) pairwise penalties across millions of items

These belong to ranking or offline training/index builds.


Roadmap (near-term)

  • Trainers for two-tower (PyTorch) with export hooks → FAISS builders
  • Concrete User* / Item* context families and default encoders
  • Stronger delta overlay index option with periodic materialization
  • More slate policies (ε-greedy, TS, position-dependent UCB variants)
  • Live metrics dashboard & offline/online parity checks

Closing notes

unirec is pragmatic by design: fast retrieval, smart ranking, reproducible caches.
If you’re experimenting with RL for recsys, this split gives you safe iteration loops: swap modules, keep contracts.

Feedback welcome — this is WIP and evolving.

RecommendationRecommendationReinforcement LearningContextual Bandit